Speaker: Steven Scott, Senior Economic Analyst at Google
Title: Scaling Bayesian learning through consensus Monte Carlo
Abstract: A useful definition of “big data” is data that is too big to comfortably process on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).
Bio: Steven Scott is a Senior Economic Analyst at Google, where he has worked since 2008. He received his PhD from the Harvard statistics department in 1998. He spent 9 years on the faculty of the Marshall School of Business at the University of Southern California. Between USC and Google he also had a brief tenure at Capital One, where he was a Director of Statistical Analysis. Dr. Scott is a Bayesian statistician specializing in Monte Carlo computation. In his academic life he has written papers on Bayesian methods for hidden Markov models, multinomial logistic regression, item response models, support vector machines. These methods have been applied to network intrusion detection, web traffic modeling, educational testing, health state monitoring, and brand choice, among others. Since joining Google he has focused on models for time series with many contemporaneous predictors, on scalable Monte Carlo computation, and on Bayesian methods for the multi-armed bandit problem.