FALL 2018 SCHEDULE
Next we address the computationally expensive offline cost of currently available ROM techniques. We then introduce SNS, a practical method for reducing the offline cost. The conforming subspace condition and the subspace inclusion relation are used to justify SNS. Numerical results show that the SNS solution accuracy is comparable to traditional methods.
October 18th: TBD
October 25: Ali Eshragh, Statistics and Optimisation, U Newcastle
Looking beyond SGD: Robust Optimization and Second-order information for Large-scale training of Neural Networks
An important next step in machine learning is the ability to train on massively large datasets. However, stochastic gradient descent, the de-facto method used for training neural networks, is not amenable to scaling without expensive hyper-parameter tuning. One approach to address the challenge of large-scale training is to use large mini-batch sizes, which allows parallel training. However, large batch size training with SGD often results in models with poor generalization performance and poor robustness. The methods proposed so far to address this only work for special cases, and often require hyper-parameter tuning themselves.
Here, we introduce very recent results on a novel Hessian-based method that in combination with robust optimization avoids many of the aforementioned issues. Extensive testing of the method on different neural networks (state-of-the-art residual networks and even compressed models such as SqueezeNext) on multiple datasets(Cifar-10/100, SVHN, and ImageNet) show significant improvements compared to state-of-the-art, without any tuning. We also discuss how these algorithms could be effectively parallelized through communication-avoiding algorithms, achieving up to 13x speed up compared to the baseline.
- Related papers:
- arxiv:1802.08241 (NIPS'18)
- arxiv:1810.01021 (under review)
- arxiv:1712.04432 (SPAA'18)
Bio: Amir Gholami is a postdoctoral research fellow in BAIR Lab at Berkeley. He received his PhD from UT Austin, working with Prof George Biros on biophysics-based image analysis, a research topic that received UT Austin's best doctoral dissertation award in 2018. He is a Melosh Medal finalist, recipient of best student paper award inSC'17, Gold Medal in the ACM Student Research Competition, as well as best student paper finalist in SC'14. His current research includes large-scale training of Neural Networks, stochastic second-order optimization methods, and robust optimization.