Product Details
Data Analytics: A Small Data Approach is suitable for an introductory data analytics course to help students understand some main statistical learning models. It has many small datasets to guide students to work out pencil solutions of the models and then compare with results obtained from established R packages. Also, as data science practice is a process that should be told as a story, in this book there are many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines.
The main models covered in this book include linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, and deep learning. Each chapter introduces two or three techniques. For each technique, the book highlights the intuition and rationale first, then shows how mathematics is used to articulate the intuition and formulate the learning problem. R is used to implement the techniques on both simulated and real-world dataset. Python code is also available at the book’s website: http://dataanalyticsbook.info.
Table of Contents
1. INTRODUCTION
Who will benefit from this book
Overview of a Data Analytics Pipeline
Topics in a Nutshell
2. ABSTRACTION
Regression & tree models
Overview
Regression Models
Tree Models
Remarks
Exercises
3. RECOGNITION
Logistic regression & ranking
Overview
Logistic Regression Model
A Ranking Problem by Pairwise Comparison
Statistical Process Control using Decision Tree
Remarks
Exercise
4. RESONANCE
Bootstrap & random forests
Overview
How Bootstrap Works
Random Forests
Remarks
Exercises
5. LEARNING (I)
Cross validation & OOB
Overview
Cross-Validation
Out-of-bag error in Random Forest
Remarks
Exercises
6. DIAGNOSIS
Residuals & heterogeneity
Overview
Diagnosis in Regression
Diagnosis in Random Forests
Clustering
Remarks
Exercises
7. LEARNING (II)
SVM & ensemble Learning
Overview
Support Vector Machine
Ensemble Learning
Remarks
Exercises
data analytics
8. SCALABILITY
LASSO & PCA
Overview
LASSO
Principal Component Analysis
Remarks
Exercises
9. PRAGMATISM
Experience & experimental
Overview
Kernel Regression Model
Conditional Variance Regression Model
Remarks
Exercises
10. SYNTHESIS
Architecture & pipeline
Overview
Deep Learning
inTrees
Remarks
Exercises
CONCLUSION
APPENDIX: A BRIEF REVIEW OF BACKGROUND KNOWLEDGE
The normal distribution
Matrix operations
Optimization