PhD Thesis Defense
In the past couple of decades, non-smooth convex optimization has emerged as
a powerful tool for the recovery of structured signals (sparse, low rank, etc.)
from noisy linear or non-linear measurements in a variety of applications in
genomics, signal processing, wireless communications, machine learning, etc..
Taking advantage of the particular structure of the unknown signal of interest
is critical since in most of these applications, the dimension p of the signal to
be estimated is comparable, or even larger than the number of observations n.
With the advent of Compressive Sensing there has been a very large number of
theoretical results that study the estimation performance of non-smooth convex
optimization in such a high-dimensional setting.
A popular approach for estimating an unknown signal 0 2 Rp in a generalized
linear model, with observations y = g(X 0) 2 Rn, is via solving the estima-
tor ^ = arg min L(y;X ) + f( ). Here, L( ; ) is a loss function which is
convex with respect to its second argument, and f( ) is a regulaizer that en-
forces the structure of the unknown 0. We rst analyze the generalization error
performance of this estimator, for the case where the entries of X are drawn
independently from real standard Gaussian distribution. The precise nature of
our analysis permits an accurate performance comparison between di erent in-
stances of these estimators, and allows to optimally tune the hyperparameters
based on the model parameters. We apply our result to some of the most popu-
lar cases of generalized linear models, such as M-estimators in linear regression,
logistic regression and generalized margin maximizers in binary classi cation
problems, and Poisson regression in count data models. The key ingredient of
our proof is the Convex Gaussian Min-max Theorem (CGMT), which is a tight
version of the Gaussian comparison inequality proved by Gordon in 1988. Un-
fortunately, having real iid entries in the features matrix X is crucial in this
theorem, and it cannot be naturally extended to other cases.
But for some special cases, we prove some universality properties and indirectly
extend these results to more general designs of the features matrix X, where
the entries are not necessarily real, independent, or identically distributed. This
extension, enables us to analyze problems that CGMT was incapable of, such as
models with quadratic measurements, phase-lift in phase retrieval, and data re-
covery in massive MIMO, and help us settle a few long standing open problems
in these areas.