Monday, May 7, 2012
4:15 pm
Annenberg 105

Applied Mathematics Colloquium

Statistics and Computation in the Age of Massive Data
Michael Jordan, Pehong Chen Distinguished Professor, EECS & Statistics, UC Berkeley
There are many issues remaining to be addressed, or even formulated, at the interface of statistics and computation. One way to capture the current state of affairs is the following: If we view data as a resource, how can it be that in many practical problems of interest we find ourselves embarassed by being given too much data? Our inferential procedures typically use polynomial amounts of time and space but that doesn't suffice; we need to be able to guarantee that on a fixed computational budget the statistical risk decreases as the number of data points grows (without bound). A general theory not yet being available, in this talk I present three vignettes that describe various lines of attack on the problem: one involving the bootstrap, another involving matrix completion algorithms and the third involving phylogenetic analysis in the regime of large numbers of taxa. All three vignettes involve divide-and-conquer strategies, with the third vignette being particularly interesting in this regard (divide-and-conquer arises from Poisson thinning). [Joint work with Alexandre Bouchard-Cote, Ariel Kleiner, Lester Mackey, Purna Sarkar and Ameet Talwalkar.]
Contact Sydney Garstang sydney@caltech.edu at x4555
For more information see http://www.acm.caltech.edu
Add this event to my calendar