skip to main content
Caltech

CAIA Speaker Event: Gabriel Wu, OpenAI @ Caltech

Friday, February 27, 2026
5:00pm to 6:00pm
Add to Cal
Chen 100
Gabriel Wu, OpenAI,
  • Public Event

​Who: Gabriel Wu, OpenAI (In-Person)

When: February 27, 5–6 pm PT

Where: Chen 100, Caltech

​Title: Teaching LLMs to Confess

​Abstract: We train GPT-5 to self-report misbehavior by producing an auxiliary "confession message" that receives an independent reward during RL. We find that models are typically honest in their confessions, and this honesty increases with training. We will also discuss connections between our approach and standard chain-of-thought monitoring, and whether we expect confessions to work on more egregiously misaligned models.

​Bio: Gabriel Wu is a researcher on the Alignment team at OpenAI where he works on training models to more reliably follow human instructions. Previously, he worked at the Alignment Research Center and led the AI Safety Student Team at Harvard.

​Everyone is welcome: no specific technical background is required. Come learn and ask questions. And yes, we will have pizza and boba.

For more information, please contact Shuhul Mujoo by phone at 4088860958 or by email at [email protected] or visit Link to Luma.