Foundation Models
Foundation models are large-scale AI systems pre-trained on massive unlabeled datasets via self-supervised learning, producing general-purpose substrates that can be specialized to many downstream tasks via fine-tuning or in-context prompting. They represent a paradigm shift from earlier “one model per task” approaches and are the architectural basis for most current frontier AI systems.
The Paradigm Shift
The traditional approach trained specialized models per task with human-labeled data — bottlenecked by labeling costs and inability to transfer knowledge across tasks. Foundation models broke through by:
- Training on massive unlabeled data via self-supervised objectives (next-token prediction, masked reconstruction)
- Producing general representations that transfer across tasks
- Enabling rapid adaptation through fine-tuning, RLHF, or prompting
The breakthrough was enabled by GPU specialization, transformer architectures, and the explosion of online text/image/video data. Examples include GPT-4 and Claude (language), DALL-E 3 and Stable Diffusion (vision), GPT-4V and Gemini (multimodal), and Gato (multi-task RL).
Foundation Models vs. Frontier Models
The distinction matters: foundation models are general-purpose substrates; frontier models are the cutting edge of capability in any domain. They overlap (Claude 3.5 Sonnet is both) but not always — AlphaFold is frontier in protein structure prediction without being a foundation model.
Training Pipeline
A two-stage architecture:
- Pre-training — the model learns broad patterns from millions to billions of unlabeled examples with no specific task in mind
- Adaptation — either:
- Fine-tuning — additional weight updates on a specific task or with rlhf
- Prompting / in-context learning — guiding behavior via crafted inputs without updating weights
Properties That Matter for Safety
Transfer Learning
Knowledge from pre-training transfers to new tasks. Critically, harmful biases and undesired behaviors transfer alongside useful knowledge.
Zero-shot and Few-shot Learning
Foundation models perform new tasks with very few examples or none. This emergent generalization is powerful but makes deployment behavior harder to predict.
Cross-domain Generalization with Goal Decoupling
Capabilities generalize across domains, but goals and constraints often don’t generalize alongside them. A model might generalize text-manipulation skill to a new domain without preserving safety constraints meant to apply broadly. This is the core mechanism behind deceptive-alignment and goal-misgeneralization.
Why Foundation Models Are a Safety Paradigm Shift
Foundation models introduce risk classes that didn’t exist for narrow systems:
- Power centralization — only large actors can train base models from scratch, concentrating capability in a few labs
- Homogenization — many downstream systems share the same blind spots and flaws of one base model; correlated failure modes across the ecosystem
- Dual-use capability — emergent capabilities expand the attack surface unpredictably
- Misalignment rises from theoretical to immediate as capabilities expand
Many of the safety challenges discussed elsewhere in the field — from goal-misgeneralization to scalable-oversight — are deeply connected to how these models are trained and adapted.
Connection to Other Concepts
- scaling-laws — describes how foundation model capability scales with compute, data, and parameters
- bitter-lesson — foundation models are the canonical instance of “general methods leveraging computation beat hand-crafted solutions”
- transformative-ai — foundation models are the substrate for plausible TAI scenarios
- ai-population-explosion — the parallelism of foundation-model copies is what makes the population-explosion thesis work
- rag — one common adaptation pattern, retrieval-augmented inference
Related Pages
- scaling-laws
- bitter-lesson
- rlhf
- transformative-ai
- intelligence-explosion
- ai-population-explosion
- deceptive-alignment
- ai-alignment
- rag
- atlas-ch1-capabilities-02-foundation-models
- ai-safety-atlas-textbook
- situational-awareness
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- AI Safety Atlas Ch.1 — Foundation Models — referenced as
[[atlas-ch1-capabilities-02-foundation-models]] - Summary: Situational Awareness — The Decade Ahead — referenced as
[[situational-awareness]]