EE Systems Seminar
Abstract: In this talk, we study two classes of optimization problems arising in modern machine learning applications: "Learning deep models" and "tuning hyper-parameters". In the first part of the talk, we present a unifying framework to study the local/global optima equivalence of the modern non-convex optimization problems. Using local openness property of the underlying training models, we provide simple sufficient conditions under which any local optimum of the resulting optimization problem is globally optimal. We first completely characterize the local openness of matrix multiplication mapping in its range. Then we use our characterization to: 1) show that every local optimum of two layer linear networks is globally optimal. Unlike many existing results in the literature, our result requires no assumption on the target data matrix $Y$, and input data matrix $X$. 2) develop almost complete characterization of the local/global optima equivalence of multi-layer linear neural networks. We provide various counterexamples to show the necessity of each of our assumptions. 3) show global/local optima equivalence of non-linear deep models having certain pyramidal structure. Unlike some existing works, our result requires no assumption on the differentiability of the activation functions or the loss function.
In the second part of the talk, we consider the problem of optimizing the cross validation (CV) loss for a given learning problem. We first develop a computationally efficient approximate for CV and provide theoretical guarantees for its performance. Then we use our approximate to provide an optimization algorithm for finding the optimal hyper-parameters in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of our approximate as well as our proposed framework for the optimal regularizer. This is a joint work with Maher Nouiehed (USC), Ahmad Beirami (MIT), Shahin Shahrampour (Harvard), Vahid Tarokh (Harvard)
Bio: Meisam Razaviyayn is an assistant professor at the department of Industrial and Systems Engineering at the University of Southern California. Prior to joining USC, he was a postdoctoral research fellow in the Electrical Engineering Department at Stanford University. He obtained his Ph.D. degree in Electrical Engineering with a minor in Computer Science from the University of Minnesota in 2014. He is the recipient of the Signal Processing Society Young Author Best Paper Award. He was among the three finalists of the Best Paper Prize for Young Researcher in Continuous Optimization in ICCOPT 2013 and 2016. His research interests include the design and study of data analysis algorithms and tools which can efficiently scale to modern big data problems.