AI Safety Atlas Ch.2 — Misuse Risks
Source: Misuse Risks
How humans deliberately leverage AI capabilities for harm — across biological weapons, cyberattacks, autonomous weapons, and adversarial AI exploitation. “Technology is an amplifier of intentions” — each technological advancement expands potential harm radius.
Bio Risk
Offense-defense imbalance: developing a novel virus costs ~1 billion. AI tilts this further toward attackers.
Empirical demonstrations:
- Researchers redirected drug-discovery AI toward toxicity, generating “40,000 potentially toxic molecules within six hours.”
- Students without biology backgrounds used AI chatbots to identify pandemic pathogens, production methods, DNA synthesis firms likely to overlook screening, and detailed protocols — within one hour.
Moving capability frontier: experts predicted AI wouldn’t match top virology teams on troubleshooting until after 2030; testing showed the threshold had already been reached.
DNA synthesis vulnerabilities: 2023 MIT study — researchers ordered 1918 pandemic flu fragments and ricin using simple evasion tactics. 12 of 13 International Gene Synthesis Consortium members fulfilled the disguised orders.
Democratization trend: declining DNA synthesis costs (halving every 15 months) + cloud labs + benchtop synthesis machines + AI assistance → bioweapon creation increasingly accessible to non-institutional actors.
This connects to and substantially deepens the wiki’s existing biosecurity page.
Cyber Risk
Existing vulnerabilities scale: the CrowdStrike software update caused $5B in damage across airlines, hospitals, banks. “Cyberattack overhangs” exist — devastating attacks remain possible due to attacker restraint, not robust defenses.
AI-enabled capabilities:
- Phishing at scale — AI-generated emails: 65% success vs. 60% human-written; 40% less time to create.
- Voice/visual — minutes of audio for voice cloning; one image for face-swap deepfakes.
- Autonomous exploitation — AI agents “successfully hacked 73% of test targets” autonomously. OpenAI’s o3 helped discover a zero-day Linux kernel vulnerability requiring expert kernel knowledge.
- Malware acceleration — WormGPT generates malicious code without expertise; polymorphic variants automatically create variations security tools don’t recognize.
Cost transformation: autonomous AI agents can hack websites for ~$10/attempt — 8× cheaper than human expertise, enabling unprecedented scale.
Offense-defense balance: attackers need only one weakness; defenders must secure everything. AI enables “flash attacks” executable in minutes, outpacing human response.
Autonomous Weapons Risk
No longer theoretical:
- Libya 2021 — autonomous drones made targeting decisions without human control
- Ukraine — AI-enabled loitering munitions with autonomous target tracking (both sides)
- Gaza — AI-guided drone swarm attacks
- Turkey’s Kargu-2 — finds and attacks targets autonomously
Driving incentives: speed (DARPA’s AI beat F-16 pilots in simulated dogfights with maneuvers “too precise and rapid for humans to counter”); cost (US Replicator program: thousands of autonomous drones at fraction of traditional aircraft cost); resilience (GPS-denied environments preclude human-in-the-loop control).
Erosion of meaningful human control: operators face “only seconds to verify computer-suggested targets” under battlefield stress, defaulting to acceptance. The Lavender system assigns numerical scores to residents predicting armed-group membership; human officers only set thresholds — execution becomes automated downstream.
Arms race: China and Russia targeting 2028–2030 for major military automation; US deploying thousands of autonomous drones by 2025. “Only actors willing to compromise safety remain in the race.”
Escalation risks: AI military systems consistently recommend more aggressive actions than human strategists, including escalating to nuclear weapons in simulations. Multiple AI systems engaging create unexpected feedback loops “similar to financial flash crashes — except this time with missiles instead of stocks.”
This deepens the wiki’s existing autonomous-weapons, ai-military-applications pages and the work of ann-katrien-oimann / andrew-rebera.
Adversarial AI Risk
Beyond misuse-of-AI, misuse-against-AI is its own category. Four sub-types:
Runtime attacks:
- Visual perturbations — “a panda with imperceptible changes classified as a gibbon with 99.3% confidence”
- Physical attacks — stickers on stop signs trick autonomous vehicles
- Dolphin attacks — ultrasonic frequencies undetectable to humans control voice assistants from up to 1.7m
- Prompt injection — Slack’s AI assistant leaked confidential info via injections in public channels
Automated attack generation — AutoDAN reliably generates jailbreak prompts; attacks frequently transfer across models (GPT/Claude/Gemini/Llama).
Data poisoning — corrupts during training. “Attackers only need to contribute some training data once to permanently compromise the system.” Backdoor example: poisoning 0.1% of training data created reliable backdoors in facial recognition. Larger models can be more vulnerable to certain poisoning attacks — opposite of expected robustness scaling.
Privacy extraction — membership inference attacks, model inversion. LLMs can be prompted to reveal email addresses, phone numbers, social security numbers.
Compounding effects: privacy extraction enables more effective adversarial examples; attacks amplify each other.
Defense trade-offs: adversarial training improves robustness against known attacks but reduces normal-input performance. Hardening against one attack sometimes increases vulnerability to others.
Connection to Wiki
This subchapter substantially deepens existing pages:
- biosecurity — DNA synthesis and democratization specifics
- autonomous-weapons — concrete 2025 deployment data
- ai-military-applications — escalation simulations
- robustness — adversarial robustness trade-offs
It also connects to the various-redteams SR2025 agenda (which catalogs many of these attack vectors empirically) and to wmd-evals-weapons-of-mass-destruction.