ICME Summer Workshops 2022 | Fundamentals of Data Science

2022 Summer Workshop Series will be online via Zoom Aug 1-19

ICME’s annual Summer Workshop Series will offer a variety of virtual data science and AI courses, taught live via Zoom by world renowned Stanford faculty and Stanford-affiliated instructors. The series is open to the general public worldwide. Discounts are offered to students, staff, and faculty from all schools as well as to ICME industry partners.

The series offers:

Intermediate workshops such as Data Privacy and Ethics, Intermediate Topics in Machine Learning & Deep Learning, and Deep Learning for Natural Language Processing.
Twelve workshops offered over three weeks, from August 1-19.
Half-day workshops (from either 8-11 am or 1-4 pm Pacific time) spread over two days.

Participants taking four or more workshops can earn a Stanford ICME Fundamentals of Data Science Summer Workshops "Certificate of Completion."

ICME Summer Workshops Class Information

Linear Algebra

Monday, August 1 & Tuesday, August 2, 2022 | 8:00 AM - 11:00 AM PDT

Linear algebra forms the foundation of many algorithms in computational mathematics and engineering, and data science is no exception. In this workshop, we explore the beauty and power of linear algebra, and discuss the most critical linear algebra concepts and algorithms used in data science. The concepts and algorithms will be introduced through and motivated by common data problems, including fitting, compressing, searching and recommending.

About the Instructor: Professor Margot Gerritsen is a Professor [Emerita] at Stanford and an affiliated ICME faculty. She is Co-Founder and current Executive Director of the global Women in Data Science initiative (WiDS, widsconference.org). From 2010 to 2018, she served as the Director of ICME, and from 2015-2020 as Senior Associate Dean in the School of Earth, Energy and Environmental Sciences. She is also currently the Chair of the Board of the Society for Industrial and Applied Mathematics (siam.org). She received her Ph.D. in Scientific Computing and Computational Mathematics at Stanford in 1997. After five years as faculty member at the University of Auckland, she returned to Stanford in 2001. Margot specializes in computational modeling of fluid flow processes, numerical analysis and data science. She taught several of the ICME core and service courses in numerical analysis and linear algebra, as well as courses in renewable energy while at Stanford. Margot's favorite area of computational mathematics is linear algebra and she is much looking forward to this short course. In her free time, Margot can be found outdoors mostly, hiking, biking, gardening and riding her motorbike with husband Paul, or indoors reading or picking her banjo. Margot has a son who recently graduated from Stanford.

Back to Schedule Overview

Introduction to Statistics

Monday, August 1 & Tuesday, August 2, 2022 | 1:00 PM - 4:00 PM PDT

Statistics is the science of learning from data. This workshop will help you to develop the skills you need to analyze data and to communicate your findings. There won't be many formulas in the workshop; rather, we will develop the key ideas of statistical thinking that are essential for learning from data.

We will discuss the main tools for descriptive statistics which are essential for exploring data, with an emphasis on visualizing information. We will explain the important ideas about sampling and conducting experiments. Then we will look over some important rules of probability and discuss normal approximation and the central limit theorem. We will show you the important concepts and pitfalls of regression and how to do inference with confidence intervals and tests of hypotheses. You will learn how to analyze categorical data and discuss one-way analysis of variance. Finally, we will look at reproducibility, data snooping and the multiple testing fallacy, and how to account for multiple comparisons. These issues have become particularly important in the era of big data.

Broadly, there are three main reasons why statistical literacy is essential in data science: First, it provides the skills to assess whether the data are sufficient to answer the questions at hand. Second, it establishes a rigorous framework for quantifying uncertainty. And finally, it provides techniques for effectively communicating the findings of your analyses. This workshop equips you with the important tools in all of these areas. It is the statistical foundation on which the recent exciting advances in machine learning are built.

About the Instructor: Professor Guenther Walther studied mathematics, economics, and computer science at the University of Karlsruhe in Germany and received his Ph.D. in Statistics from UC Berkeley in 1994. His research has focused on statistical methodology for detection problems, shape-restricted inference, and mixture analysis, and on statistical problems in astrophysics and in flow cytometry.

He received a Terman fellowship, a NSF CAREER award, and the Distinguished Teaching Award of the Dean of Humanities and Sciences at Stanford. He has served on the editorial boards of the Journal of Computational and Graphical Statistics, the Journal of the Royal Statistical Society, the Annals of Statistics, the Annals of Applied Statistics, and Statistical Science. He was program co-chair of the 2006 Annual Meeting of the Institute of Mathematical Statistics and served on the executive committee of IMS from 1998 to 2012.

Back to Schedule Overview

Introduction to Python

Wednesday, August 3 & Thursday, August 4, 2022 | 8:00 AM - 11:00 AM PDT

Introduction to Python will focus on scientific computing, data science and machine learning.

More precisely, the class will cover:

Python basics (variables, if/else, loops, functions)
Numpy and Pandas
Scipy and Scikit-learn

The class is designed for people with some experience programming, but no experience in Python. We will introduce each topic enough so that you can quickly start using Python for your own problems knowing what tools are most appropriate. The workshop will be interactive with many examples (that the participants can play with during the session).

Prerequisites: Basic programming knowledge (variables, if/else, loops, and function) with a language other than Python is required.

About the Instructor: Andreas Santucci is a computational statistician, currently practicing Data Science at Google and Lecturing at Stanford University, where he teaches graduate students in STEM fields how to program in Python and C++.

Back to Schedule Overview

Introduction to Programming in R

Wednesday, August 3 & Thursday, August 4, 2022 | 1:00 PM - 4:00 PM PDT

This workshop is recommended for those who have at least some sort of programming background in another language, and who wish to learn the basics of R programing. The goal of this workshop is to familiarize participants with R for statistical analysis. Lectures will have practice questions to help guide students' understanding as we progress through the material.

Example topics:

Data types in R, variables, and apply functions.
Data I/O
Plotting in base R
Statistical applications, such as how to get a summary of the data and run linear regressions.

About the Instructor: Andreas Santucci is a computational statistician, currently practicing Data Science at Google and Lecturing at Stanford University, where he teaches graduate students in STEM fields how to program in Python and C++.

Introduction to Mathematical Optimization

Friday, August 5 & Monday, August 8, 2022 | 1:00 AM - 4:00 PM PDT

Mathematical optimization underpins many applications across science and engineering, as it provides a set of formal tools to compute the ‘best’ action, design, control, or model from a set of possibilities. In data science, machine learning, and artificial intelligence, mathematical optimization is the engine of model training and learning. This workshop will provide an overview of the key elements of this topic (unconstrained, constrained, convex optimization, optimization for model training), and will have a practical focus, with participants formulating and solving optimization problems early and often using standard modeling languages and solvers. By introducing common models from machine learning and other fields, this workshop aims to make participants comfortable with optimization tools so that they may use it for rapid prototyping and experimentation in their own work. Students should be comfortable with linear algebra, differential multivariable calculus, and basic probability and statistics. Experience with Python will be helpful, but not required.

Topics to be discussed in this workshop include:

Formulating optimization problems
Fundamentals of constrained and unconstrained optimization
Convex optimization
Optimization methods for model fitting in machine learning
Optimization in Python using SciPy and CVXPY
In-depth Jupyter Notebook examples from machine learning, statistics, and other fields

About the Instructor: Kevin Carlberg is an AI Research Science Manager at Facebook Reality Labs and an Affiliate Associate Professor of Applied Mathematics and Mechanical Engineering at the University of Washington. He leads a research team focused on enabling the future of augmented and virtual reality through AI-driven innovations. His individual research combines concepts from machine learning, computational physics, and high-performance computing to drastically reduce the cost of simulating nonlinear dynamical systems at extreme scale. Previously, Kevin was a Distinguished Member of Technical Staff at Sandia National Laboratories in Livermore, California, where he led a research group of PhD students, postdocs, and technical staff in applying these techniques to a range of national-security applications in mechanical and aerospace engineering.

Back to Schedule Overview

Introduction to Machine Learning

Monday, August 8 & Tuesday, August 9, 2022 | 8:00 AM - 11:00 AM PDT

This workshop presents the basics behind understanding and using modern machine learning algorithms. We will discuss a framework for reasoning about when to apply various machine learning techniques, emphasizing questions of over-fitting/under-fitting, interpretability, supervised/unsupervised methods, and handling of missing data. The principles behind various algorithms—the why and how of using them—will be discussed, while some mathematical detail underlying the algorithms—including proofs—will not be discussed. Unsupervised machine learning algorithms presented will include k-means clustering, principal component analysis (PCA), multidimensional scaling (MDS), tSNE, and independent component analysis (ICA). Supervised machine learning algorithms presented will include support vector machines (SVM), lasso, elastic net, classification and regression trees (CART), boosting, bagging, and random forests. Imputation, regularization, and cross-validation concepts will also be covered. The R programming language will be used for occasional examples, though participants need not have prior exposure to R.

Prerequisites: Undergraduate-level linear algebra and statistics; basic programming experience (R/Matlab/Python).

About the Instructor: Dr. Alexander Ioannidis earned his Ph.D. in Computational and Mathematical Engineering and Masters in Management Science and Engineering both at Stanford University. He is a research fellow working on developing novel machine learning techniques for medical and genomic applications in the Department of Biomedical Data Science. Prior to Stanford he earned his bachelors in Chemistry and Physics from Harvard and a M.Phil from the University of Cambridge. He conducted research for several years on novel superconducting and quantum computing architectures at Northrop Grumman's Advanced Technologies research center. In his free time he enjoys sailing.

Back to Schedule Overview

Introduction to Deep Learning

Tuesday, August 9 & Wednesday, August 10, 2022 | 1:00 PM - 4:00 PM PDT

Deep Learning is a rapidly expanding field with new applications found every day. In this workshop, we will cover the fundamentals of deep learning for the beginner. We will introduce the math behind training deep learning models: the back-propagation algorithm. Building conceptual understanding of the fundamentals of deep learning will be the focus of the first part of the workshop. We will then cover some of the popular architectures used in deep learning, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), LSTMs, autoencoders and GANs. There will be a hands-on computing tutorial using Jupyter notebooks to build a basic image classification model via transfer learning. By the end of the workshop, participants will have a firm understanding of the basic terminology and jargon of deep learning and will be prepared to dive into the plethora of online resources and literature available for each specific application area.

Prerequisites: Familiarity of basic concepts from linear algebra, such as vectors and matrices, as well as calculus concepts, such as differentiation. Familiarity with the python programming language and an ability to use Jupyter notebooks will be helpful for the hands-on sessions.

About the Instructor: Dr. Aashwin Mishra is a Project Scientist at the Machine Learning Initiative at the National Accelerator Laboratory (SLAC). His research focuses on uncertainty quantification, probabilistic modeling, interpretability/explainability, and optimization across physics applications.

Back to Schedule Overview

Data Visualization in Tableau

Thursday, August 11 & Friday, August 12, 2022 | 8:00 AM - 11:00 AM PDT

This workshop will cover best practices for telling compelling stories via data visualization, with demos and hands-on exercises in Tableau. Tableau, which originated as a Stanford research project, is a powerful tool for data exploration, manipulation, and visualization. Topics will include how we interpret visualizations, which types of visualizations are most effective in which contexts, and the difference between visual accuracy and visual precision. In Tableau, you will learn how to load data, create calculated fields to highlight key aspects of your data stories, create audience-appropriate visualizations (bar charts, trend lines, scatter plots, etc.), make use of annotations and interactivity, and more. The workshop’s goal is to help attendees in industry and academia better communicate their projects and research. At the end of the workshop, attendees will be able to better determine when to use complex visuals like network or parallel coordinate graphs versus when a bar chart or even no visual at all works best, and will have a solid foundation in creating visualizations in Tableau.

About the Instructors:

Dr. Kathryn Potts is the Director of Analytics in Stanford’s School of Engineering, and has worked in various data analyst roles at Stanford since 2009. She comes to analysis and visualization work from a background in linguistics, which she studied at Carleton College (BA) and UMass Amherst (PhD) and has taught at Hampshire College and Stanford University. She is fascinated by higher education, passionate about making sure decision-makers have the right information, presented the right way, at the right time, and always loves a good logic puzzle. Her cat is named Osgood, and she is the very best cat.

Luca Alessi, a graduate from UC Berkeley, is a visual artist currently working as the Lead Visual Designer at Environmental Health & Safety at Stanford University. In his freelance work, he is focused on translating complex ideas into accessible ones via animation, art, and data visualizations. He is particularly interested in playing with the constraints of data visualizations — both honoring them and challenging them — in order to capture the dynamic, boundless ambitions of the stories we tell. His cat is named T-Rex, and she is also the very best cat.

Back to Schedule Overview

Introduction to High Performance Computing

Thursday, August 11 & Friday, August 12, 2022 | 1:00 PM - 4:00 PM PDT

In the past 50 years, supercomputers have achieved what was once considered only possible in Sci-Fi movies. The key to the tremendous success of supercomputers has been a combination of outstanding architectures plus software that uses all the available resources and makes parallelization possible. This secret sauce has led to different implementations across fields. Scientists typically rely on three main programming interfaces: OpenMP for shared memory computers, CUDA for GPU computing, and MPI for distributed memory computers. MPI in particular is essential to be able to achieve performance on systems with millions of nodes. This workshop explores the key features of these three approaches, explaining their underlying philosophy and how they leverage the different computer architectures. The final goal is to give the student a taste of the different programming paradigms and the tools to decide which is the best approach.

About the Instructor: Professor Eric Darve received his Ph.D. in Applied Mathematics at the Jacques-Louis Lions Laboratory, in the Pierre et Marie Curie University, Paris, France. His advisor was Prof. Olivier Pironneau, and his Ph.D. thesis was entitled "Fast Multipole Methods for Integral Equations in Acoustics and Electromagnetics." He was previously a student at the Ecole Normale Supérieure, rue d'Ulm, Paris, in Mathematics and Computer Science. Prof. Darve became a postdoctoral scholar with Profs. Moin and Pohorille at Stanford and NASA Ames in 1999 and joined the faculty at Stanford University in 2001. He is a member of the Institute for Computational and Mathematical Engineering.

Back to Schedule Overview

Deep Learning for Natural Language Processing

Monday, August 15 & Tuesday, August 16, 2022 | 8:00 AM - 11:00 AM PDT

This workshop will introduce common practical use cases where natural language processing (NLP) models are applied using the latest advances in deep learning (e.g. Transformer-based models such as BERT). In this hands-on session, we will be coding in Python and using commonly used libraries such as Keras. The topics that we will cover include:

machine translation
sentiment extraction
named entity recognition

Some experience with both Python and Machine Learning is required. We recommend taking the introductory Machine Learning and Deep Learning ICME workshops for a better understanding of the material included in this NLP workshop.

About the Instructor: Afshine Amidi is currently working on solving NLP problems at Google. He also teaches the Data Science Tools class to graduate students at MIT. Previously, he worked in applied machine learning for recommender systems at Uber Eats where he focused on building ranking models that improve the quality of the overall search results by taking into account several objective functions. Also, Afshine published a few papers at the intersection of Deep Learning and computational biology. He holds a Bachelor’s and a Master’s Degree from École Centrale Paris and a Master’s Degree from MIT.

Shervine Amidi is currently working on problems at the intersection of ranking and natural language processing at Google. Previously, he worked in applied machine learning for recommender systems at Uber Eats where he focused on representation learning to better surface dish recommendations. Also, Shervine published a few papers at the intersection of deep learning and computational biology. He holds a Bachelor’s and a Master’s Degree from École Centrale Paris and a Master’s Degree from Stanford University.

Back to Schedule Overview

Data Privacy and Ethics

Monday, August 15 & Tuesday, August 16, 2022 | 1:00 PM - 4:00 PM PDT

This workshop engages with difficult challenges in the modern practice of data science and the design of data products. We will begin by discussing the promises and perils of mining digital exhaust: location, transaction, social media, and other data types that are increasingly recorded and accessible within digital platforms. The uses of such data will be discussed along a privacy--utility trade-off, providing a framework for thinking though objectives such as data minimization. The discussion of digital exhaust will carry forward into an introduction to differential privacy and the problems it can and can not address. A second theme of the workshop will be discussing the relative merits of observational vs. experimental (A/B testing) data-driven decision making with regard to the design and improvement of data products. Specific examples will include recommendations systems and search engine design, but discussion will also touch on the decision-making surrounding the deployment of learning algorithms in online platforms.

About the Instructor: : Johan Ugander is an Assistant Professor at Stanford University in the Department of Management Science & Engineering, within the School of Engineering. His research develops algorithmic and statistical frameworks for analyzing social systems, social networks, and other large-scale social data. Prior to joining the Stanford faculty he was a postdoctoral researcher at Microsoft Research Redmond 2014-2015 and held an affiliation with the Facebook Data Science team 2010-2014. He obtained his Ph.D. in Applied Mathematics from Cornell University in 2014. His awards include a Young Investigator Award from the Army Research Office (ARO), three Best Paper Awards (2012 ACM WebSci Best Paper, 2013 ACM WSDM Best Student Paper, 2020 AAAI ICWSM Best Paper), and the 2016 Eugene L. Grant Undergraduate Teaching Award from the Department of Management Science & Engineering.

Back to Schedule Overview

Intermediate Topics in Machine Learning and Deep Learning

Wednesday, August 17 & Friday, August 19, 2022 | 1:00 PM - 4:00 PM PDT

Through a series of rapid surveys, including guest lectures, we will present an overview of recent topics in deep learning and machine learning with particular relevance for practitioners. Areas will include embeddings and dimensionality reduction, transfer learning, representation learning, weakly supervised / semi-supervised / self-supervised and active learning. This workshop will assume a familiarity with basic concepts from both machine learning and deep learning as taught in the introductory workshops on those topics, but it will not assume a deep statistical background. Prior experience with applying neural networks is highly recommended.

About the Instructors: Dr. Aashwin Mishra is a Project Scientist at the Machine Learning Initiative at SLAC. His research focuses on uncertainty quantification, probabilistic modeling, interpretability/explainability, and optimization across physics applications.

Back to Schedule Overview

If you would like to sign up to receive email notifications regarding the summer workshops, you can subscribe here.