EE Systems Seminar
ABSTRACT In many scientific domains, the number of individuals in the population under study is often very large, however the number of observations available per individual is often very limited (sparse). Limited observations prohibit accurate estimation of parameters of interest for any given individual. In this sparse data regime, the key question is, how accurately can we estimate the distribution of parameters over the population? This problem arises in various domains such as epidemiology, psychology, health-care, biology, and social sciences. As an example, suppose for a large random sample of the population we have observations of whether a person caught the flu for each year over the past 5 years. We cannot accurately estimate the probability of any given person catching the flu with only 5 observations, however our goal is to estimate the distribution of these probabilities over the whole population. Such an estimated distribution can be used in downstream tasks, like testing if the distribution is uniform, estimating what fraction of the population has probability greater than 1/2 of contracting the flu.
Our main results show that the maximum likelihood estimator (MLE) is minimax optimal in the sparse observation regime. While the MLE for this problem was proposed as early as the late 1960's, how accurately the MLE recovers the true distribution was not known. Our work closes this gap. In the course of our analysis, we provide new results in polynomial approximations, providing novel bounds on the coefficients of Bernstein polynomials approximating Lipschitz-1 functions. Furthermore, the MLE is also efficiently computable in this setting. We evaluate the MLE on real datasets and show that the performance of natural MLE is on par with more complex moment matching estimators.
Joint work with Weihao Kong, Gregory Valiant, Sham Kakade.
BIO Ramya Korlakai Vinayak is a postdoctoral researcher in Paul G. Allen School of Computer Science and Engineering at the University of Washington in Seattle, working with Sham Kakade. Her research interests broadly span the areas of machine learning, statistical inference, and crowdsourcing. She received Ph.D in Electrical Engineering at Caltech where she worked with Babak Hassibi. She was supported by the Faculty of the Future fellowship (2013-2015) awarded by the Schlumberger Foundation. She obtained her B.Tech from the IIT Madras and obtained her MS at Caltech.