Catherine Olsson

ML safety researcher. At the time of the 80,000 Hours podcast interview, Olsson was working on the Google Brain safety team, having previously worked at OpenAI. She focuses on the practical, engineering-driven side of AI alignment research.

Work

Olsson’s work centers on making AI systems safe through concrete implementation rather than purely theoretical approaches. She describes the AI safety challenge as having two parts: giving the system the right objective (outer alignment) and ensuring it optimizes for that objective robustly across different situations.

Her hands-on contributions include implementing prototypes for reward-learning — using large volumes of human feedback to train AI systems to behave correctly. This work at OpenAI was early groundwork for what became rlhf (Reinforcement Learning from Human Feedback).

Olsson is notable for articulating that ML engineers — not just alignment theorists — can make meaningful contributions to AI safety. Strong software engineering skills transfer directly to safety-relevant ML work, and the field needs people who can bridge theoretical ideas and working code.

Significance

Olsson represents a pragmatic, empirical strand of ai-safety work that complements more theoretical alignment research. Her career trajectory — OpenAI to Google Brain safety — illustrates how safety-oriented work is spreading across frontier labs.