Hands-On Exploratory Data Analysis with R
上QQ阅读APP看书,第一时间看更新

The benefits of EDA across vertical markets

Every organization today produces and relies on a lot of data in their everyday processes. Before making assumptions and decisions based on this data, organizations need to be able to understand it. EDA enables data analysts and data scientists to bring this information to the right people. It is the most important step on which a data-driven organization should focus its energy and resources.

Having practical tools in hand for carrying out EDA helps data analysts and data scientists produce reproducible and knowledgeable data analysis results. R is one of the most popular data analysis environments, so it makes sense to equip your data analysis teams with powerful R techniques to make the most of their EDA skills.

At the time of writing this book, there are more than 13,000 R packages available according to CRAN. You can get R packages for all kinds of tasks and domains. For our purpose, we will be concentrating on a particular set of R packages that are considered the best by the R community for the purpose of EDA. Some of the packages that we are going to cover may not be directly related to EDA, but they are relevant for other stages of dealing with the data, as indicated by the following diagram:

We will introduce these packages briefly in this chapter and go into more detail as the book progresses. The different stages are as mentioned as follows:

  • Pre Modeling Stage: This stage involves the manipulation of the data frame based on Data Visualization, Data Transformation, Missing Value Imputations, Outlier Detection, Feature Selection, and Dimension Reduction.
  • Modeling Stage: This stage is considered as an intermediate stage that involves Continuous Regression, Ordinal Regression, Classification, Clustering, and Time Series with Survival.
  • Post Modeling Stage: This stage is considered as a final stage where only output interpretation is considered on high priority. It includes the implementation of various algorithms such as clustering, classification, and regression.