This workshop presented the basics behind the application of modern machine learning algorithms. We discussed a framework for reasoning about when to apply various machine learning techniques, emphasizing questions of over-fitting/under-fitting, regularization, interpretability, supervised/unsupervised methods, and handling of missing data. The principles behind various algorithms--the why and how of using them--were discussed, while some mathematical detail underlying the algorithms--including proofs--were not discussed. Unsupervised machine learning algorithms presented included k-means clustering, principal component analysis (PCA), and independent component analysis (ICA). Supervised machine learning algorithms presented included support vector machines (SVM), classification and regression trees (CART), boosting, bagging, and random forests. Imputation, the lasso, and cross-validation concepts were also covered. The R programming language was used for examples, though participants did need not to have prior exposure to R.
Prerequisite: undergraduate-level linear algebra and statistics; basic programming experience (R/Matlab/Python).
- Basic Concepts and Intro to Supervised Learning: linear and logistic regression
- Penalties, regularization, sparsity (lasso, ridge, and elastic net)
- Unsupervised learning: clustering (k-means and hierarchical) and dimensionality reduction (Principal Component Analysis, Independent Component Analysis, Self-Organizing Maps, Multi-Dimensional Scaling)
- Unsupervised Learning: NMF and text classification (bag of words model)
- Supervised Learning: loss functions, cross-validation (bias variance trade-off and learning curves), imputation (K-nearest neighbors and SVD), imbalanced data
- Classification and Regression Trees (CART)
- Ensemble methods (Boosting, Bagging, and Random Forests)
- Support Vector Machines (SVM)
- Deep learning: Neural Networks (Feed-Forward, Convolutional, Recurrent) and training algorithms
Alexander is a PhD candidate in the Institute for Computational and Mathematical Engineering at Stanford. His research--under Prof. Carlos Bustamante, chair of the department of biomedical data science at Stanford Medical School--focuses on applying machine learning techniques to medicine and human genetics. Prior to Stanford he earned his bachelors in Chemistry and Physics from Harvard and a MPhil from the University of Cambridge. He worked for several years on superconducting and quantum computing architectures at Northrop Grumman's Advanced Technologies research center in Linthicum, MD. In his free time he enjoys sailing.
Gabriel Maher is a PhD student at the Institute for Computational and Mathematical Engineering at Stanford University. For his research Gabriel is applying Deep Learning to cardiovascular medical image analysis with Dr. Alison Marsden at the Cardiovascular Biomechanics Computation Lab.