OpenAI

OpenAI is one of the world’s leading AI research laboratories and the developer of the GPT series of large language models. Founded in 2015 as a non-profit with the mission of ensuring artificial general intelligence benefits all of humanity, OpenAI has since transitioned to a capped-profit structure and become one of the most commercially successful AI companies in the world. It appears throughout this wiki as both a key institution in AI capability development and a central player in ai-safety efforts.

Structure: public benefit corporation.

Safety Teams (per Shallow Review 2025)

Alignment
Safety Systems — sub-teams: Interpretability, Safety Oversight, Pretraining Safety, Robustness, Safety Research, Trustworthy AI, plus a new Misalignment Research team
Preparedness — capability-evaluation and threat-modeling for catastrophic risk
Model Policy
Safety and Security Committee
Safety Advisory Group

OpenAI has no named successor to Superalignment since the 2024 dissolution of that project. The 2025 Persona Features paper had a distinct author list from the formal safety teams.

Risk Management Framework

OpenAI’s Preparedness Framework (v2) defines capability tiers and required safeguards. Sister frameworks at other labs: anthropic’s responsible-scaling-policy and deepmind’s Frontier Safety Framework.

Public Alignment Agenda

OpenAI has no explicit public alignment agenda as of 2025. Boaz Barak offers personal views including Machines of Faithful Obedience, but these are not institutional positions.

Key People (current safety org)

Johannes Heidecke, Boaz Barak, Mia Glaese, Jenny Nitishinskaya, Lama Ahmad, Naomi Bashkansky, Miles Wang, Wojciech Zaremba, David Robinson, Zico Kolter, Jerry Tworek, Eric Wallace, Olivia Watkins, Kai Chen, Chris Koch, Andrea Vallone, Leo Gao.

Historical Safety Figures

Several prominent AI safety researchers have been associated with OpenAI:

paul-christiano — Developed iterative-amplification while at OpenAI before founding ARC
jan-leike — Led the superalignment project before departing
nick-joseph — Co-founder of anthropic, who left OpenAI to build a more safety-focused lab
leopold-aschenbrenner — Former researcher who authored Situational Awareness
daniel-kokotajlo — Former researcher who led creation of AI 2027

The Superalignment Project (2023–2024)

OpenAI’s most significant institutional commitment to safety was the superalignment project, co-led by jan-leike and Ilya Sutskever. The project committed 20% of OpenAI’s compute resources to solving the alignment problem for superintelligent AI within four years. Its research agenda focused on three pillars: mechanistic interpretability, generalization, and scalable-oversight.

The Superalignment project’s core strategy was to automate alignment research using AI itself — building AI systems that could help solve alignment for even more powerful successors. The team was effectively dissolved in 2024 after Leike and several others departed; their public statements made tensions between commercial pressures and safety commitments at the lab a major debate in the field.

Funding

Microsoft (largest investor), AWS, Oracle, NVIDIA, SoftBank, G42, AMD, Dragoneer, Coatue, Thrive, Altimeter, MGX, Blackstone, TPG, T. Rowe Price, Andreessen Horowitz, D1 Capital Partners, Fidelity Investments, Founders Fund, Sequoia, and others.

Notable 2025 Outputs (selection from SR2025)

60-page System Cards now contain a large amount of OpenAI’s public safety work
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Persona Features Control Emergent Misalignment
Stress Testing Deliberative Alignment for Anti-Scheming Training
Deliberative Alignment: Reasoning Enables Safer Language Models
Toward understanding and preventing misalignment generalization
Trading Inference-Time Compute for Adversarial Robustness
Findings from a pilot Anthropic–OpenAI alignment evaluation exercise
Weight-sparse transformers have interpretable circuits

OpenAI also maintains alignment.openai.com and a Safety Evaluations Hub.

Critiques

External critiques aggregated in SR2025: Stein-Perlman, MIRI’s response to “How We Think About Safety and Alignment”, underelicitation in eval reports, Midas on transparency, the Anduril defense partnership, Carlsmith on labs in general.

A recurring theme: OpenAI is difficult to model as a single agent — Altman himself has said “I very rarely get to have anybody work on anything… researchers are going to work on what they’re going to work on, and that’s that.”

In AI 2027

In the AI 2027 scenario, “OpenBrain” — a thinly veiled stand-in for OpenAI — is the leading US AI project that builds AI agents capable of dramatically accelerating AI research, triggering the intelligence explosion. The scenario explores how such a lab might navigate the tension between racing ahead and pausing for safety, ultimately depicting two branching outcomes: catastrophic and cautiously optimistic.

Significance

OpenAI occupies a unique and contested position in the AI safety landscape. It has produced pioneering alignment research and made substantial institutional commitments to safety, while simultaneously pushing the frontier of AI capabilities at a pace that many safety researchers consider dangerous. The departure of multiple safety-focused researchers has fueled ongoing debate about whether frontier AI labs can genuinely prioritize safety while competing commercially.

AI Safety Compendium

Explorer

OpenAI

OpenAI

Safety Teams (per Shallow Review 2025)

Risk Management Framework

Public Alignment Agenda

Key People (current safety org)

Historical Safety Figures

The Superalignment Project (2023–2024)

Funding

Notable 2025 Outputs (selection from SR2025)

Critiques

In AI 2027

Significance

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

OpenAI

OpenAI

Safety Teams (per Shallow Review 2025)

Risk Management Framework

Public Alignment Agenda

Key People (current safety org)

Historical Safety Figures

The Superalignment Project (2023–2024)

Funding

Notable 2025 Outputs (selection from SR2025)

Critiques

In AI 2027

Significance

Related Pages

Graph View

Graph view

Table of Contents

Backlinks