PhD Thesis Defense
Many driving factors of physical systems are often latent or unobserved. Thus, understanding such systems crucially relies on accounting for the influence of the latent structure. This talk describes advances in three aspects of latent-variable modeling: inference, algorithms, and applications. Concretely, motivated by obtaining accurate and interpretable statistical model of the California reservoir system, we focus on two key challenges that arise :
- The latent structure of the latent-variable model of the reservoirs encodes the effect of external factors. How do we assess the extent to which our latent variable model has learned true or false discoveries about the relevant physical phenomena? Existing inferential techniques rely on the discrete structure of the decision space and are not applicable in settings where the underlying model exhibits a more complicated structure (e.g. smooth structure of latent spaces).
- Many relevant variables in the water resources deviate strongly from Gaussianity. Existing techniques to fit a graphical model to data suffer from one or more of these deficiencies: a) they are unable to handle non-Gaussianity, b) they are based on non-convex or computationally intractable algorithms, and c) they cannot account for latent variables.
Methods to address these challenges would be useful for reservoir modeling and more broadly across the data sciences. With respect to the first challenge, we describe a geometric reformulation of the notion of a discovery, which enables the development of model selection methodology for a broader class of problems. We highlight the utility of this viewpoint in problems involving latent-variable modeling and low-rank estimation, with a specific algorithm to control for false discoveries in these settings. With respect to the second challenge, we develop a framework, based on Generalized Linear Models, that addresses all these shortcomings. A particularly novel aspect of our formulation is that it incorporates regularizers that are tailored to the type of latent variables -- e.g. max-2 norm for Bernoulli variables, and complete positive norm for Poisson variables -- with a corresponding semidefinite relaxation in each case.