skip to main content
Caltech

H.B. Keller Colloquium

Monday, April 1, 2024
4:00pm to 5:00pm
Add to Cal
Annenberg 105
Score Entropy Discrete Diffusion Models
Stefano Ermon, Associate Professor, Department of Computer Science, Stanford University,

Diffusion models are at the core of many state-of-the-art generative AI systems for content such as images, videos, and audio. These models crucially rely on estimating gradients of the data distribution (scores) and efforts to generalize score-based modeling to discrete structures have had limited success. As a result, state-of-the-art generative models for discrete data such as language are based on autoregressive modeling (i.e. next token prediction). In this work, we bridge this gap by proposing a framework that extends score matching to discrete spaces and integrates seamlessly to build discrete diffusion models. The resulting Score Entropy Discrete Diffusion models are an alternative probabilistic modeling technique that achieves highly competitive performance at the scale of GPT-2 while introducing distinct algorithmic benefits. Our empirical results challenge the longstanding dominance of autoregressive modeling and could pave the way for an alternative class of language models built from radically different principles

For more information, please contact Sumaia Abedin by phone at 6263956704 or by email at [email protected] or visit https://www.cms.caltech.edu/news-events/keller-colloquium.