CAIA Speaker Event: Gabriel Wu, OpenAI @ Caltech

Friday, February 27, 2026

5:00pm to 6:00pm

Add to Cal

Chen 100

Gabriel Wu, OpenAI,

Public Event

Who: Gabriel Wu, OpenAI (In-Person)

When: February 27, 5–6 pm PT

Where: Chen 100, Caltech

Title: Teaching LLMs to Confess

Abstract: We train GPT-5 to self-report misbehavior by producing an auxiliary "confession message" that receives an independent reward during RL. We find that models are typically honest in their confessions, and this honesty increases with training. We will also discuss connections between our approach and standard chain-of-thought monitoring, and whether we expect confessions to work on more egregiously misaligned models.

Bio: Gabriel Wu is a researcher on the Alignment team at OpenAI where he works on training models to more reliably follow human instructions. Previously, he worked at the Alignment Research Center and led the AI Safety Student Team at Harvard.

Everyone is welcome: no specific technical background is required. Come learn and ask questions. And yes, we will have pizza and boba.

For more information, please contact Shuhul Mujoo by phone at 4088860958 or by email at [email protected] or visit Link to Luma.