OpenAI
OpenAI is one of the world’s leading AI research laboratories and the developer of the GPT series of large language models. Founded in 2015 as a non-profit with the mission of ensuring artificial general intelligence benefits all of humanity, OpenAI has since transitioned to a capped-profit structure and become one of the most commercially successful AI companies in the world. It appears throughout this wiki as both a key institution in AI capability development and a central player in ai-safety efforts.
Structure: public benefit corporation.
Safety Teams (per Shallow Review 2025)
- Alignment
- Safety Systems — sub-teams: Interpretability, Safety Oversight, Pretraining Safety, Robustness, Safety Research, Trustworthy AI, plus a new Misalignment Research team
- Preparedness — capability-evaluation and threat-modeling for catastrophic risk
- Model Policy
- Safety and Security Committee
- Safety Advisory Group
OpenAI has no named successor to Superalignment since the 2024 dissolution of that project. The 2025 Persona Features paper had a distinct author list from the formal safety teams.
Risk Management Framework
OpenAI’s Preparedness Framework (v2) defines capability tiers and required safeguards. Sister frameworks at other labs: anthropic’s responsible-scaling-policy and deepmind’s Frontier Safety Framework.
Public Alignment Agenda
OpenAI has no explicit public alignment agenda as of 2025. Boaz Barak offers personal views including Machines of Faithful Obedience, but these are not institutional positions.
Key People (current safety org)
Johannes Heidecke, Boaz Barak, Mia Glaese, Jenny Nitishinskaya, Lama Ahmad, Naomi Bashkansky, Miles Wang, Wojciech Zaremba, David Robinson, Zico Kolter, Jerry Tworek, Eric Wallace, Olivia Watkins, Kai Chen, Chris Koch, Andrea Vallone, Leo Gao.
Historical Safety Figures
Several prominent AI safety researchers have been associated with OpenAI:
- paul-christiano — Developed iterative-amplification while at OpenAI before founding ARC
- jan-leike — Led the superalignment project before departing
- nick-joseph — Co-founder of anthropic, who left OpenAI to build a more safety-focused lab
- leopold-aschenbrenner — Former researcher who authored Situational Awareness
- daniel-kokotajlo — Former researcher who led creation of AI 2027
The Superalignment Project (2023–2024)
OpenAI’s most significant institutional commitment to safety was the superalignment project, co-led by jan-leike and Ilya Sutskever. The project committed 20% of OpenAI’s compute resources to solving the alignment problem for superintelligent AI within four years. Its research agenda focused on three pillars: mechanistic interpretability, generalization, and scalable-oversight.
The Superalignment project’s core strategy was to automate alignment research using AI itself — building AI systems that could help solve alignment for even more powerful successors. The team was effectively dissolved in 2024 after Leike and several others departed; their public statements made tensions between commercial pressures and safety commitments at the lab a major debate in the field.
Funding
Microsoft (largest investor), AWS, Oracle, NVIDIA, SoftBank, G42, AMD, Dragoneer, Coatue, Thrive, Altimeter, MGX, Blackstone, TPG, T. Rowe Price, Andreessen Horowitz, D1 Capital Partners, Fidelity Investments, Founders Fund, Sequoia, and others.
Notable 2025 Outputs (selection from SR2025)
- 60-page System Cards now contain a large amount of OpenAI’s public safety work
- Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
- Persona Features Control Emergent Misalignment
- Stress Testing Deliberative Alignment for Anti-Scheming Training
- Deliberative Alignment: Reasoning Enables Safer Language Models
- Toward understanding and preventing misalignment generalization
- Trading Inference-Time Compute for Adversarial Robustness
- Findings from a pilot Anthropic–OpenAI alignment evaluation exercise
- Weight-sparse transformers have interpretable circuits
OpenAI also maintains alignment.openai.com and a Safety Evaluations Hub.
Critiques
External critiques aggregated in SR2025: Stein-Perlman, MIRI’s response to “How We Think About Safety and Alignment”, underelicitation in eval reports, Midas on transparency, the Anduril defense partnership, Carlsmith on labs in general.
A recurring theme: OpenAI is difficult to model as a single agent — Altman himself has said “I very rarely get to have anybody work on anything… researchers are going to work on what they’re going to work on, and that’s that.”
In AI 2027
In the AI 2027 scenario, “OpenBrain” — a thinly veiled stand-in for OpenAI — is the leading US AI project that builds AI agents capable of dramatically accelerating AI research, triggering the intelligence explosion. The scenario explores how such a lab might navigate the tension between racing ahead and pausing for safety, ultimately depicting two branching outcomes: catastrophic and cautiously optimistic.
Significance
OpenAI occupies a unique and contested position in the AI safety landscape. It has produced pioneering alignment research and made substantial institutional commitments to safety, while simultaneously pushing the frontier of AI capabilities at a pace that many safety researchers consider dangerous. The departure of multiple safety-focused researchers has fueled ongoing debate about whether frontier AI labs can genuinely prioritize safety while competing commercially.
Related Pages
- superalignment
- ai-alignment
- ai-safety
- scalable-oversight
- anthropic
- deepmind
- paul-christiano
- jan-leike
- leopold-aschenbrenner
- daniel-kokotajlo
- 80k-podcast-jan-leike-superalignment
- ai-2027
- ai-governance
- intelligence-explosion
- deceptive-alignment
- interpretability
- robustness
- capability-evaluations
- ai-safety
- concrete-problems-in-ai-safety
- iterative-amplification
- nick-joseph
- responsible-scaling-policy
- situational-awareness
- catherine-olsson
- daniel-ziegler
- future-of-humanity-institute
- metr
- redwood-research
- rob-wiblin
- 80k-podcast-nick-joseph-anthropic-safety
- 80k-podcast-olsson-ziegler-ml-engineering
- 80k-podcast-paul-christiano
- summary-bostrom-ai-policy