Skip to content Skip to navigation

CME 510: Linear Algebra and Optimization Seminar


CME 510 is held on Thursdays at 4:30-5:30 PM in Y2E2-101.
January 11: NO SEMINAR
January 18: Dr. Joseph Grcar, Lawrence Berkeley National Lab (retired)
Title:  Mathematician and Soldier Andre-Louis Cholesky, and Phoenix of Physics Enrico Fermi
Abstract: I will present background material for two recently published biographies of Andre-Louis Cholesky and of Enrico Fermi, both of which I reviewed for SIAM Review*.  In particular, I will explain how Cholesky invented Cholesky's method and how Fermi stimulated interest in computational physics.  Aside from the important contributions Cholesky and Fermi made to computing and to physics, respectively, their lives are interesting as case studies of military research in government institutions.  After academic teaching, probably the most common career path for doctoral researchers today is government sponsored work for military purposes.
*The book reviews are in SIAM Review vol 58 no 2 (Jun 2016) and vol 59 no 4 (Dec 2017).
Bio: Joseph Grcar was a computational scientist at national laboratories in Livermore and Berkeley for many years, where he was one of the original developers of the Chemkin software for chemical kinetics. He specialized in calculations for reacting fluid flows, combustion, and linear algebra.  He originated the partial reorthogonalization technique for Lanczos methods, and a matrix and a polynomial are named after him.  Grcar received a doctorate in mathematics from the University of Illinois in 1980.
January 25: Yair Carmon, Electrical Enginnering, Stanford
Title: The oracle complexity of finding stationary points of non-convex functions
Abstract: We establish algorithmic upper bounds and algorithm-independent lower bounds for finding approximate stationary points of smooth, high-dimensional, potentially non-convex functions.  To establish the upper bounds, we introduce "convex until proven guilty," an algorithm that leverages Nesterov's acceleration of gradient descent to guarantee faster convergence to stationarity, for non-convex functions with Lipchitz gradient and Hessian.  To establish our lower bounds, we combine classical ideas from large-scale convex optimization with novel non-convex constructions.  A main consequence of our results is that gradient descent is worst-case optimal for functions with only Lipschitz gradient; when more derivatives are Lipschitz, our lower bounds nearly match the "convex until proven guilty" rates.
Based on joint work with John Duchi, Oliver Hinder and Aaron Sidford.
February 1: Chi Jin, Electrical Engineering and Computer Science, UC Berkeley
Title: Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Abstract: Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open.  I will introduce a simple variant of AGD, and show that it escapes saddle points and finds a second-order stationary point in O(1/epsilon^{7/4}) iterations, faster than the O(1/epsilon^2) iterations required by GD.  To the best of our knowledge, this is the first gradient-based algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a stationary point.
Our analysis is based on two key ideas:
(1) the use of a simple Hamiltonian function, inspired by a continuous-time perspective, which AGD monotonically decreases per step even for nonconvex functions;
(2) a novel framework called improve or localize, which is useful for tracking the long-term behavior of gradient-based optimization algorithms.
We believe that these techniques may deepen our understanding of both acceleration algorithms and nonconvex optimization.
Bio: Chi Jin is a PhD candidate in Computer Science at UC Berkeley, advised by Michael Jordan.  He received his BS in Physics from Peking University in 2012.  His research primarily focuses on learning problems and optimization algorithms in the nonconvex setting.  His interests also include statistical learning and representation learning.
February 8: Oliver Hinder, Management Science & Engineering, Stanford
Title: A One-phase Interior Point Method For Nonconvex Optimization
Abstract: The work of Wachter and Biegler suggests that infeasible-start interior point methods (IPMs) developed for linear programming cannot be adapted to nonlinear optimization without significant modification, i.e., using a two-phase or penalty method.  We propose an IPM that, by careful initialization and updates of the slack variables, is guaranteed to find a first-order certificate of local infeasibility, local optimality, or unboundedness of the (shifted) feasible region. Our proposed algorithm differs from other IPM methods for nonconvex programming because we reduce primal feasibility at the same rate as the barrier parameter.  This gives an algorithm with more robust convergence properties and closely resembles successful algorithms from linear programming.  We implement the algorithm and compare with IPOPT on a subset of CUTEst problems.  Our algorithm requires a similar median number of iterations, but fails on only 9% of the problems compared with 16% for IPOPT.  Experiments on infeasible variants of the CUTEst problems indicate superior performance for detecting infeasibility.  (Joint work with Yinyu Ye.)
February 15: Viral Shah, Julia Computing
Title:  On Machine Learning and Programming Languages
Abstract: We ask, what might the ideal ML language of the future look like? Our thoughts are published in this blog post:
As programming languages (PL) people, we have watched with great interest as machine learning (ML) has exploded -- and with it, the complexity of ML models and the frameworks people are using to build them.  State-of-the-art models are increasingly programs, with support for programming constructs like loops and recursion, and this brings out many interesting issues in the tools we use to create them -- that is, programming languages.
While machine learning does not yet have a dedicated language, several efforts are effectively creating hidden new languages underneath a Python API (like TensorFlow) while others are reusing Python as a modeling language (like PyTorch).  We'd like to ask -- are new ML-tailored languages required, and if so, why?
We will also discuss how Julia evolved to get where it is today, and how it might evolve to taking on some of the challenges posed by machine learning.
February 22: Robert Bixby, Gurobi Optimization, Inc.
Title: Optimization: Past, Present, Future
Abstract: For the vast majority of business applications, optimization means linear and mixed-integer programming.  Beginning with Dantzig's simplex method in 1947, optimization experienced a slow, uneven period of development into the mid 1980s.  Then, beginning in the late 1980s, developments ensued that completely transformed optimization and its applications, driven by truly remarkable performance improvements in the underlying solvers.  What's coming next may be even more exciting. Driven by an explosion in available business data, a new broad corporate focus on extracting value from that data, increased computing power, and the continually expanding power of optimization solvers, optimization promises to become an indispensable tool in managing the modern enterprise.
Bio: Dr Bixby has a BS in IEOR (UC Berkeley) and a PhD in OR (Cornell). He is Professor Emeritus in CAAM (Rice).  He is co-founder of CPLEX (1987) and Gurobi (2008).
March 1: Ben Southworth, Lawrence Livermore National Lab
March 8: TBA
March 15: Huda Nassar, CS, Purdue