AI Agents
AI agents are AI systems capable of taking sequences of actions in the world — using tools, controlling computers, browsing the web, writing and executing code, and pursuing multi-step goals autonomously. They represent a qualitative shift from AI as a question-answering tool to AI as an active participant.
What Makes a System Agentic
A base language model responds to a prompt and stops. An agent wraps that model in a scaffold that allows it to:
- Use tools (search, code execution, file systems)
- Take actions whose results feed back into subsequent steps
- Pursue goals over extended time horizons
- Operate with minimal human intervention per step
leopold-aschenbrenner’s Situational Awareness frames “computer use” — multimodal models that control computers like humans do — as a key “unhobbling” gain that transforms capable models into true agents. This is one of the three drivers of effective compute growth alongside raw compute scaling and algorithmic efficiency.
Capability Jump from Scaffolding
The capability gain from agentic scaffolding is substantial even without model improvements. GPT-4 scored 2% on the SWE-Bench software engineering benchmark as a bare model, but 14-23% when wrapped in an agent framework. This represents a 7-11x jump from scaffolding alone — equivalent to a major model upgrade in terms of real-world task performance.
Safety Implications
Agentic AI systems raise distinct safety challenges compared to single-turn models:
- Error compounding: Mistakes in early steps cascade through subsequent actions.
- Irreversibility: Actions in the world (sending emails, modifying files, executing code) may be difficult or impossible to undo.
- Oversight difficulty: Supervising a long chain of actions is harder than reviewing a single response.
- Deceptive alignment: An agent with misaligned goals has more opportunity to act on them during extended autonomous operation.
These challenges connect to active research in scalable-oversight, ai-control, and interpretability.
Relation to Intelligence Explosion
Agents that can write code and conduct research are a prerequisite for the intelligence-explosion scenario. Once AI agents can improve AI training pipelines, the feedback loop between AI capability and AI-assisted research begins to accelerate.
Related Pages
- transformative-ai
- intelligence-explosion
- scalable-oversight
- ai-control
- deceptive-alignment
- interpretability
- scaling-laws
- sa-ch1-from-gpt4-to-agi
- situational-awareness
- leopold-aschenbrenner
- ai-autonomy-levels
- foundation-models
- autonomy-evals
- ai-safety-atlas-textbook
- atlas-ch1-capabilities-04-current-capabilities
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- AI Safety Atlas Ch.1 — Current Capabilities — referenced as
[[atlas-ch1-capabilities-04-current-capabilities]] - Summary: Situational Awareness — The Decade Ahead — referenced as
[[situational-awareness]]