Outsource AI Risk to the Right People

    Anthropic CEO Dario Amodei was still arguing about Claude’s red lines in Washington, D.C., when Claude went to war. Just hours after the U.S. Defense Department effectively terminated its contract with Anthropic, U.S. officials likely used Claude in their airstrikes against Iran. Weeks later, how and in what capacity remains unclear.

    The Pentagon’s ongoing row with Anthropic is only the most visible symptom of a deeper affliction. In past months, a raft of AI ethics and safety experts have resigned from the top AI companies. They were not the first, and they likely will not be the last. And if the resignations continue, AI’s future will be decided without the people who are most concerned about reducing its risks.

    For most people, most of the time, the dangers of AI are remote—the kind of risk that we outsource to experts in the same way that we expect aviation engineers to think about airplane crashes so we don’t have to.

    The spectrum of AI risks is new, and the community of experts who will control it is still taking shape. Nuclear experts have been here before. And their history offers a lesson: keep dissenting voices in the room or risk a fractured expert class whose incentives may not align with the broader public’s.


    In interviews andto employees, Amodei ceaselessly recommends one book: Richard Rhodes’s near-900-page The Making of the Atomic Bomb, a Pulitzer Prize-winning account of the Manhattan Project.

    It’s not a triumphalist text. Rhodes documents the scientists’ repeated failures to control how their new technology would be used. Physicists Leo Szilard and James Franck—and even J. Robert Oppenheimer—tried to set conditions on the bomb’s employment, arguing for international controls on nuclear technology. Their efforts were overtaken by the imperatives of an emerging Cold War.

    Those who made their peace with the new national security paradigm, such as John von Neumann and Edward Teller, joined a new class of intellectuals, the “nuclear priesthood.” They developed new methods and technologies such as game theory and cutting-edge computation. They spoke their own abstract “technostrategic” language and worked out logics of nuclear warfighting, escalation, and deterrence outside the scope of public scrutiny. Dissenting voices who broke from the nuclear orthodoxy—supporters of disarmament, smaller arsenals, or even different deterrence postures—were excluded from centers of power and planning. Oppenheimer, Bernard Brodie, Daniel Ellsberg—dissenters were stripped of clearances, sidelined, or ignored.

    The lesson that Amodei and others appear to draw from Rhodes is one of unintended consequences: The scientists who built a transformative technology repeatedly failed to control it. But AI advocates might also be drawing another unintended and risky lesson from this history. Despite the Manhattan scientists’ unheeded warnings, a vast arms buildup, and a superpower confrontation that often turned violent, no leader has used nuclear weapons in a war since the United States bombed Japan in 1945. Perhaps this means that the system worked without needing dissent. Even if Amodei isn’t overly reassured, other AI researchers might be reading nuclear history to find a false source of comfort.

    There’s a catch. Historical research on close callsfalsewarnings, nuclear weapons accidentally dropped from planes, lost at sea, erroneously transported, and crises that almost spiraled out of control—suggests that nuclear war was just a hairbreadth away, again and again. The nuclear record thus suffers from a survivorship bias in which nuclear policy’s ostensible success in preventing nuclear war discounts the necessary role played by luck.

    This record has domesticated nuclear weapons’ catastrophic potential, mentally shunting the horror of nuclear war to a background condition. The most catastrophic dangers of artificial intelligence, like those of nuclear weapons or engineered pathogens, seem remote. They are hard to imagine and will seemingly happen somewhere else or to someone else—if we believe that they will ever happen at all. This is the problem of psychological distance, the subjective sense that a threat is remote from the self, here, and now.

    We thus depend on expert communities—“priesthoods”—to keep distant risks in mind precisely because the public cannot. We’re too consumed by our everyday concerns or distracted by news cycle turnover. We trust in priesthoods to keep us safe, and distance enables these groups to capture and manage risks largely outside the public eye.

    But priesthoods may develop different interests and face different incentives than the public. When those interests sufficiently diverge, priesthoods become a problem—such as when they sacrifice safety in the pursuit of profit or prestige.


    The resignations ofAI safety researchers from Anthropic and OpenAI do not bode well for AI’s emerging priesthood dynamics. The competitive logic of the AI race may be doing to them what it did to the Manhattan Project’s dissenting scientists. AI companies will show them the door, paying lip service to their concerns, and building bigger anyway. Market incentives may be even more powerful than Cold War competition in relegating dissenting voices to the sidelines. Geopolitical incentives push in the same direction by nesting competition between U.S. AI developers inside a larger one with China. The speed and scale of frontier model development means that efforts to regulate AI from above, such as the EU’s AI Act or the United Nations’ Global Dialogue on AI Governance, always seem several steps behind.

    Racing ahead has sometimes been presented as safety strategy: Companies at the frontier can supposedly discover and fix model failures by releasing them, step by step, into the world and seeing what happens. But the race-to-safety logic assumes that model failures are recoverable and retroactively transparent. AI’s “black box” outputs are opaque in ways that make failures harder to attribute and harder to learn from.

    AI’s priesthood is thus forming under heavy headwinds. Regulating AI may be an even greater challenge than regulating nuclear weapons because its dangers seem to be everywhere: discriminatory deployment, mass surveillance, autonomous lethality, and even superintelligent “killer AI” are all on the AI safety agenda. This cross-domain breadth, and commercial pressures to compete, make the dangers all too easy to defer.

    What’s more, nuclear weapons may have transformed international relations, but they are ultimately very big bombs with limited use contexts. AI, by contrast, has such expansive use contexts it has been likened to electricity and water.


    The good news is that if the nuclear example illustrates what can go wrong, other industries—especially those with frequently used, reliably safe, everyday technology—may show what can go right. The history of aviation, for example, suggests that durable safety architectures are never purely voluntary nor simply imposed from above. They emerge from the messy interaction between accidents, litigation, engineering know-how, organizational change, and regulatory pressure. Aviation safety improved over time in part because a single airplane accident, such as the runway collision at LaGuardia on March 22, has a low individual impact on the overall industry and thus allows aviation safety experts to find solutions to problems after they happen.

    Regulatory pressure for AI has been turbulent so far, but the emerging record points away from safety. In 2025, the White House of President Donald Trump revoked the Biden administration’s 2023 executive order on AI and narrowed its safety institute’s mandate from broad safety oversight to national security and competitiveness, even removing “safety” from the institute’s name. In mid-March,, it unveiled a light touch national legislative framework calling on Congress to “preempt state AI laws” that conflict with its priorities.

    Both the White House and industry representatives have criticized the European Union’s 2024 AI Act for what they see as regulatory overreach while Europe’s AI talent continues to migrate elsewhere. A governance gap has emerged instead of an AI grand bargain: Where regulation has teeth, AI investment migrates; where it migrates, regulatory oversight softens. This regulatory asymmetry may winch the emerging AI priesthood’s interests further and further from the public’s.

    What comprehensive AI safety governance may unfortunately need is a triggering crisis that closes the distance and catalyzes action. This could be a vivid, concrete catastrophe—like the midair collision in the Grand Canyon that paved the way for the Federal Aviation Administration—that is visible to the public and shocking enough to make AI harms feel close to home. Without such a catalyst and sustained subsequent pressure, the costs of deferral may always be more bearable than those of precaution.

    AI failures so far remain diffuse and probabilistic, their harms distributed across millions of interactions rather than concentrated in single wreckage sites. They are still more akin to incidents than accidents: single component failures that, although they cause harm, are fixable after the fact. This reduces outside pressures to impose more safety measures on AI, and it makes the case for safety harder to make on the inside, too—especially given engineers’ incentives to build better, faster, and stronger, before safer.

    AI also has another safety problem: speed. Aviation had time to fail in miniature before it scaled. In the late 1920s, airplane accident rates were high enough that, if applied to scale of the airline industry today, close to ten thousand people would die per year. By the 1950s, airline safety had improved: about a hundred deaths per year in the United States in an industry 35 times smaller than today’s. But the 1950s rate would still result in thousands of deaths a year at today’s scale—instead of the few hundred recorded in 2025.

    AI’s wide, speedy, and multidomain adoption makes it a challenging safety case. Right now, its potential harms range from bad medical advice to the end of humanity. The greater the safety challenge, the greater the need for creative thinking. AI companies that are serious about safety and control will need to find ways to keep dissenting voices in the room—devil’s advocates in hard hats—and insulate them from the pressures that push them out.

    AI companies may also have less-than-high-minded incentives to do so. Companies that are outwardly principle-driven—treating safety and ethics researchers as constructive critics rather than internal threats—may bring safer products and better reputations to market. After OpenAI signed the Pentagon deal instead of Anthropic, Claude got a dramatic boost in popularity, topping its rival for the first time in U.S. app store rankings.

    During the Cold War, the U.S. government siloed dissenting nuclear experts, and the world survived. If global consumers don’t want to take the same gamble with AI, they should pressure companies to take a more responsible approach to safety.

    Discussion

    No comments yet. Be the first to comment!