AI Safety Atlas Ch.2 — Misuse Risks

Source: Misuse Risks

How humans deliberately leverage AI capabilities for harm — across biological weapons, cyberattacks, autonomous weapons, and adversarial AI exploitation. “Technology is an amplifier of intentions” — each technological advancement expands potential harm radius.

Bio Risk

Offense-defense imbalance: developing a novel virus costs ~1 billion. AI tilts this further toward attackers.

Empirical demonstrations:

  • Researchers redirected drug-discovery AI toward toxicity, generating “40,000 potentially toxic molecules within six hours.”
  • Students without biology backgrounds used AI chatbots to identify pandemic pathogens, production methods, DNA synthesis firms likely to overlook screening, and detailed protocols — within one hour.

Moving capability frontier: experts predicted AI wouldn’t match top virology teams on troubleshooting until after 2030; testing showed the threshold had already been reached.

DNA synthesis vulnerabilities: 2023 MIT study — researchers ordered 1918 pandemic flu fragments and ricin using simple evasion tactics. 12 of 13 International Gene Synthesis Consortium members fulfilled the disguised orders.

Democratization trend: declining DNA synthesis costs (halving every 15 months) + cloud labs + benchtop synthesis machines + AI assistance → bioweapon creation increasingly accessible to non-institutional actors.

This connects to and substantially deepens the wiki’s existing biosecurity page.

Cyber Risk

Existing vulnerabilities scale: the CrowdStrike software update caused $5B in damage across airlines, hospitals, banks. “Cyberattack overhangs” exist — devastating attacks remain possible due to attacker restraint, not robust defenses.

AI-enabled capabilities:

  • Phishing at scale — AI-generated emails: 65% success vs. 60% human-written; 40% less time to create.
  • Voice/visual — minutes of audio for voice cloning; one image for face-swap deepfakes.
  • Autonomous exploitation — AI agents “successfully hacked 73% of test targets” autonomously. OpenAI’s o3 helped discover a zero-day Linux kernel vulnerability requiring expert kernel knowledge.
  • Malware acceleration — WormGPT generates malicious code without expertise; polymorphic variants automatically create variations security tools don’t recognize.

Cost transformation: autonomous AI agents can hack websites for ~$10/attempt — 8× cheaper than human expertise, enabling unprecedented scale.

Offense-defense balance: attackers need only one weakness; defenders must secure everything. AI enables “flash attacks” executable in minutes, outpacing human response.

Autonomous Weapons Risk

No longer theoretical:

  • Libya 2021 — autonomous drones made targeting decisions without human control
  • Ukraine — AI-enabled loitering munitions with autonomous target tracking (both sides)
  • Gaza — AI-guided drone swarm attacks
  • Turkey’s Kargu-2 — finds and attacks targets autonomously

Driving incentives: speed (DARPA’s AI beat F-16 pilots in simulated dogfights with maneuvers “too precise and rapid for humans to counter”); cost (US Replicator program: thousands of autonomous drones at fraction of traditional aircraft cost); resilience (GPS-denied environments preclude human-in-the-loop control).

Erosion of meaningful human control: operators face “only seconds to verify computer-suggested targets” under battlefield stress, defaulting to acceptance. The Lavender system assigns numerical scores to residents predicting armed-group membership; human officers only set thresholds — execution becomes automated downstream.

Arms race: China and Russia targeting 2028–2030 for major military automation; US deploying thousands of autonomous drones by 2025. “Only actors willing to compromise safety remain in the race.”

Escalation risks: AI military systems consistently recommend more aggressive actions than human strategists, including escalating to nuclear weapons in simulations. Multiple AI systems engaging create unexpected feedback loops “similar to financial flash crashes — except this time with missiles instead of stocks.”

This deepens the wiki’s existing autonomous-weapons, ai-military-applications pages and the work of ann-katrien-oimann / andrew-rebera.

Adversarial AI Risk

Beyond misuse-of-AI, misuse-against-AI is its own category. Four sub-types:

Runtime attacks:

  • Visual perturbations — “a panda with imperceptible changes classified as a gibbon with 99.3% confidence”
  • Physical attacks — stickers on stop signs trick autonomous vehicles
  • Dolphin attacks — ultrasonic frequencies undetectable to humans control voice assistants from up to 1.7m
  • Prompt injection — Slack’s AI assistant leaked confidential info via injections in public channels

Automated attack generation — AutoDAN reliably generates jailbreak prompts; attacks frequently transfer across models (GPT/Claude/Gemini/Llama).

Data poisoning — corrupts during training. “Attackers only need to contribute some training data once to permanently compromise the system.” Backdoor example: poisoning 0.1% of training data created reliable backdoors in facial recognition. Larger models can be more vulnerable to certain poisoning attacks — opposite of expected robustness scaling.

Privacy extraction — membership inference attacks, model inversion. LLMs can be prompted to reveal email addresses, phone numbers, social security numbers.

Compounding effects: privacy extraction enables more effective adversarial examples; attacks amplify each other.

Defense trade-offs: adversarial training improves robustness against known attacks but reduces normal-input performance. Hardening against one attack sometimes increases vulnerability to others.

Connection to Wiki

This subchapter substantially deepens existing pages:

It also connects to the various-redteams SR2025 agenda (which catalogs many of these attack vectors empirically) and to wmd-evals-weapons-of-mass-destruction.