Caltech Logo

Rigorous Systems Research Group (RSRG) Seminar

Thursday, February 28, 2019
12:00pm to 1:00pm
Add to Cal
Annenberg 213
Understanding the Bias and Data Efficiency of Uncertainty Sampling
Steve Mussmann, PhD Candidate, Computer Science, Stanford University,

Uncertainty Sampling is perhaps the most common active learning algorithm used in practice. In this talk, we describe two phenomena about Uncertainty Sampling and the corresponding theoretical explanations that we developed to explain these phenomena. First, Uncertainty Sampling with logistic regression on different datasets yields a wide range of data efficiencies (the factor reduction in the number of samples to reach the same error). We find both empirically and theoretically that this variation in data efficiency is inversely related to the error rate of the final classifier. Second, Uncertainty Sampling on a convex loss can converge to a wide variety of final error rates, and sometimes even lower than the error of random sampling, given infinite samples for a given dataset. We explain this phenomenon by finding that Uncertainty Sampling is implicitly optimizing the nonconvex zero-one loss.

For more information, please visit CMS Seminars & Events.