5Today we all have access to a lot of data. Even more crucially, we also have easy access, through our personal computers and powerful free software packages, to the means to process the corpus of data and extract intelligence from it. Quite needlessly though, the necessary knowledge skills remain the exclusive preserve of a few, which this book sets out to change.
Although most data analytics techniques have a mathematical basis, people with a grasp of high school mathematics can gain a deep intuitive understanding of the underlying techniques and apply them correctly and effectively. To make this possible, the book:
- Focuses on intuitive explanations with examples, while avoiding deep mathematics;
- Provides numerous examples, tables and figures (over 200 figures and 110 tables), to help readers grasp the concepts and techniques;
- Introduces the R statistical programming environment and provides step-by-step guidance to learn R and apply it to the techniques covered; After working through the book readers will be able to independently apply the techniques covered on their own data. After completing the book, readers would have mastered an important subset of the R language.
- Recognizing that people master new topics only by doing, the book provides many instructive labs, -lab assignments and review questions with detailed guidance and explanations. Rather than just providing the steps in the form of "what" to do, the book also explains "why?"
- All the data files needed to work through the labs and lab assignments are available as free downloads from the book's web site.
- To shield those who are new to any form of computer programming, the book comes with many convenience functions that can serve to automate what might otherwise be confusing procedures.
The book covers the following topics:
- Quick introduction to R programming -- assumes no prior background in R;
- Important data analytics concepts;
- Exploratory data analysis and graphing with R;
- Affinity analysis;
- Classification techniques like K nearest neighbors, Naive Bayes and Classification trees;
- Regression techniques like simple and multiple linear regression; K nearest neighbors for regression and regression trees;
- Time series analysis; and
- Data reduction techniques like Principal Component analysis (PCA) and cluster analysis (k-means clustering)
After completing the book, readers would have had a huge amount of hands-on experience, with a great intuitive understanding of the underlying theory.