BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//main.oscweb.caltech.edu//Events//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Machine Learning & Scientific Computing Series
X-WR-TIMEZONE:America/Los_Angeles
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
LAST-MODIFIED:20230407T050750Z
TZURL:https://www.tzurl.org/zoneinfo-outlook/America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZNAME:PDT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZNAME:PST
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
SUMMARY:Examining Occam's Razor in Deep Neural Networks Using Kolmogorov C
omplexity
DTSTART;TZID=America/Los_Angeles:20220329T120000
DTEND;TZID=America/Los_Angeles:20220329T130000
DTSTAMP:20240910T123421Z
UID:Machine Learning & Scientific Computing Series@Tue Mar 29 12:00:00 202
2@main.oscweb.caltech.edu
CATEGORIES:
DESCRIPTION:Ard A. Louis\, Professor of Theoretical Physics\, Department o
f Physics\, University of Oxford\n\nClassic arguments from statistical lea
rning theory\, often formulated in terms of bias-variance tradeoff\, sugge
st that models with high capacity should overfit\, and therefore generaliz
e poorly on unseen data. Deep neural networks (DNNs) appear to break thi
s basic rule of statistics\, because they perform best in the overparamete
rized regime. One way of formulating this conundrum is in terms of induc
tive bias: DNNs are highly expressive\, and so can represent almost any
function that fits a training data set. Why then are they biased towards
functions that generalize well? The source of this inductive bias mus
t arise from an interplay between network architecture\, training algori
thms\, and structure in the data. \nTo disentangle these three componen
ts\, we apply a Bayesian picture\, based on the functions expressed by a
DNN\, to supervised learning for some simple classification problems\,
including Boolean functions\, MNIST and CIFAR10. We show that the DNN pr
ior over functions is determined by the architecture\, and is biased towar
ds ``simple'\;'\; functions with low Kolmogorov complexity. This
simplicity bias can be varied by exploiting a transition between ordered
and chaotic regimes. The likelihood is calculated from the error spec
trum of functions on data sets. Combining the prior and the likelihood t
o calculate the posterior accurately predicts the behavior of DNNs trained
with stochastic gradient descent. This analysis suggests that\, to over
come the traditional bias-variance problem for models with high capacity r
equires an Occam'\;s razor-like inductive bias towards simple function
s that is powerful enough to overcome the exponential growth in the number
of functions with complexity. When this picture is combined with stru
ctured data\, it helps explain the big picture question of why DNNs gene
ralize in the overparameterized regime. It doesn'\;t (yet) explain
why some DNNs generalize better than others.\nhttps://caltech.zoom.us/rec
/share/8KJQ21y5kikJqrHDqSwQH3Fl8ra7sx7uV8nh4x3lRUUtyEbWvD8eO_56Hboj7eaT.F2
mMqEy0VbeLtlea Passcode: K3GN.5+i
LOCATION:Online Event
URL:https://www.caltech.edu/campus-life-events/calendar/machine-learning-s
cientific-computing-series-3
END:VEVENT
END:VCALENDAR