Foundation Models

Foundation models are large-scale AI systems pre-trained on massive unlabeled datasets via self-supervised learning, producing general-purpose substrates that can be specialized to many downstream tasks via fine-tuning or in-context prompting. They represent a paradigm shift from earlier “one model per task” approaches and are the architectural basis for most current frontier AI systems.

The Paradigm Shift

The traditional approach trained specialized models per task with human-labeled data — bottlenecked by labeling costs and inability to transfer knowledge across tasks. Foundation models broke through by:

Training on massive unlabeled data via self-supervised objectives (next-token prediction, masked reconstruction)
Producing general representations that transfer across tasks
Enabling rapid adaptation through fine-tuning, RLHF, or prompting

The breakthrough was enabled by GPU specialization, transformer architectures, and the explosion of online text/image/video data. Examples include GPT-4 and Claude (language), DALL-E 3 and Stable Diffusion (vision), GPT-4V and Gemini (multimodal), and Gato (multi-task RL).

Foundation Models vs. Frontier Models

The distinction matters: foundation models are general-purpose substrates; frontier models are the cutting edge of capability in any domain. They overlap (Claude 3.5 Sonnet is both) but not always — AlphaFold is frontier in protein structure prediction without being a foundation model.

Training Pipeline

A two-stage architecture:

Pre-training — the model learns broad patterns from millions to billions of unlabeled examples with no specific task in mind
Adaptation — either:
- Fine-tuning — additional weight updates on a specific task or with rlhf
- Prompting / in-context learning — guiding behavior via crafted inputs without updating weights

Properties That Matter for Safety

Transfer Learning

Knowledge from pre-training transfers to new tasks. Critically, harmful biases and undesired behaviors transfer alongside useful knowledge.

Zero-shot and Few-shot Learning

Foundation models perform new tasks with very few examples or none. This emergent generalization is powerful but makes deployment behavior harder to predict.

Cross-domain Generalization with Goal Decoupling

Capabilities generalize across domains, but goals and constraints often don’t generalize alongside them. A model might generalize text-manipulation skill to a new domain without preserving safety constraints meant to apply broadly. This is the core mechanism behind deceptive-alignment and goal-misgeneralization.

Why Foundation Models Are a Safety Paradigm Shift

Foundation models introduce risk classes that didn’t exist for narrow systems:

Power centralization — only large actors can train base models from scratch, concentrating capability in a few labs
Homogenization — many downstream systems share the same blind spots and flaws of one base model; correlated failure modes across the ecosystem
Dual-use capability — emergent capabilities expand the attack surface unpredictably
Misalignment rises from theoretical to immediate as capabilities expand

Many of the safety challenges discussed elsewhere in the field — from goal-misgeneralization to scalable-oversight — are deeply connected to how these models are trained and adapted.

Connection to Other Concepts

scaling-laws — describes how foundation model capability scales with compute, data, and parameters
bitter-lesson — foundation models are the canonical instance of “general methods leveraging computation beat hand-crafted solutions”
transformative-ai — foundation models are the substrate for plausible TAI scenarios
ai-population-explosion — the parallelism of foundation-model copies is what makes the population-explosion thesis work
rag — one common adaptation pattern, retrieval-augmented inference

scaling-laws
bitter-lesson
rlhf
transformative-ai
intelligence-explosion
ai-population-explosion
deceptive-alignment
ai-alignment
rag
atlas-ch1-capabilities-02-foundation-models
ai-safety-atlas-textbook
situational-awareness

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

AI Safety Atlas Ch.1 — Foundation Models — referenced as [[atlas-ch1-capabilities-02-foundation-models]]
Summary: Situational Awareness — The Decade Ahead — referenced as [[situational-awareness]]

AI Safety Compendium

Explorer

Foundation Models

Foundation Models

The Paradigm Shift

Foundation Models vs. Frontier Models

Training Pipeline

Properties That Matter for Safety

Transfer Learning

Zero-shot and Few-shot Learning

Cross-domain Generalization with Goal Decoupling

Why Foundation Models Are a Safety Paradigm Shift

Connection to Other Concepts

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Foundation Models

Foundation Models

The Paradigm Shift

Foundation Models vs. Frontier Models

Training Pipeline

Properties That Matter for Safety

Transfer Learning

Zero-shot and Few-shot Learning

Cross-domain Generalization with Goal Decoupling

Why Foundation Models Are a Safety Paradigm Shift

Connection to Other Concepts

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks