(Ir-)Responsible by Design? Corporate Guardrails and the Governance of Military AI

02 Mar (Ir-)Responsible by Design? Corporate Guardrails and the Governance of Military AI

02.03.26 | 0 Comments

[Jessica Dorsey is an Assistant Professor of International Law at Utrecht University School of Law; Elke Schwarz is a Professor of Political Theory at Queen Mary University London; Ingvild Bode is a Professor of International Relations, University of Southern Denmark; Zena Assaad is an Associate Professor at the School of Engineering, Australian National University; and Neil Renic is a Lecturer in Ethics at the University of New South Wales. The authors are all members of the Independent Advisory Board on Legal Reviews of the Responsible by Design Institute. Our first post analyzing recent developments can be found here.]

The recent clash between the U.S. Department of Defense (DoD) and Anthropic, which we first outlined here, marks a crucial moment in the governance of military artificial intelligence (AI). What began as a contract dispute has quickly escalated into a broader public confrontation over whether private AI developers who seek to supply systems to the DoD must dismantle their own safety guardrails to support the more permissive “all lawful” military uses standard as dictated by the Pentagon. At stake in this case is not only a $200 million defense contract, but far more consequential implications: the integration of AI models into autonomous weapon systems, their potential use in large-scale surveillance of American citizens, and the broader governance framework that will shape future military AI.

Anthropic’s CEO Dario Amodei responded Thursday, refusing to remove these two guardrails. In its statement, Anthropic declared that it “cannot in good conscience accede” to Pentagon demands that would permit the unrestricted military use of its AI models. In particular, Anthropic has maintained prohibitions on use for mass domestic surveillance and for fully autonomous weapons operating without human oversight. This stance follows the recent amendment to Anthropic’s Responsible Scaling Policy, which softened and broadened some of their original safety guardrails. What appeared to be an attempt at compromise was unsuccessful with the DoD as they continued to firmly demand unrestricted use of Anthropic’s AI systems.

The standoff highlights a fundamental legal issue at the heart of ongoing deliberations within the CCW GGE on LAWS, set to resume discussions today in Geneva. At issue are what, if any, constraints are required under domestic and international law when AI systems are capable of identifying, selecting, and potentially engaging targets without intervention by a human operator in executing these tasks.

Guardrails and the Legal Framework

In its statement, Anthropic made clear that their guardrails are not a rejection of cooperation with U.S. defense efforts or a strong denial of the potential utility of fully autonomous weapons as “critical for national defense.” Rather, the guardrails seek to ensure the operation of such systems within safe and reliable technical limits. The company emphasized that it supports national security objectives, but not at the cost of erasing safeguards around lethal autonomy and U.S. civil liberties.

This stance has technical, legal, ethical and political implications. The technical limitations are important: Anthropic’s statement explicitly notes “Some uses are also simply outside the bounds of what today’s technology can safely and reliably do.” Large language models (LLMs), such as Anthropic’s Claude, are limited in their operating capabilities, and these limitations emerge in the form of incorrect and nonsensical outputs. As Heidi Khlaaf has pointed out, these are flawed and inaccurate systems that have a narrow capacity for completing tasks that are not captured within their training data sets, and the dynamic and unpredictable nature of military operations exacerbates this constraint. It is almost impossible to fully capture military operating environments in static data sets, making these data-enabled AI systems inherently constrained in their applicability. While any AI model suffers from challenges arising from data constraints for dynamic contexts of conflict, LLMs face an additional limitation: they are probabilistic models that need “pretraining, a process of predicting the next word in huge amounts of text.” This leaves the model vulnerable to erroneous output (a mathematical inevitability of such models according to OpenAI), and the potential for data poisoning. According to a recent study conducted by Anthropic, “a small number of samples can poison LLMs of any size”. This makes LLMs particularly fragile for the general context of warfare, let alone for use within the context of fully autonomous weapon systems.

From a legal standpoint, under IHL, the principles and rules related to distinction, precaution and proportionality require belligerents to differentiate between combatants and civilians, to take all feasible measures to avoid or minimize civilian harm, and in the case when civilian harm is unavoidable, not to carry out attacks causing excessive civilian harm relative to expected military advantage. Fully autonomous weapon systems, those capable of selecting, identifying and engaging targets without human intervention, pose acute challenges to these principles. This is especially true in cases where enemy combatants are un-uniformed, and must therefore be targeted on the basis of suspected conduct. If no human meaningfully evaluates context, intent, and ability to minimize civilian harm, or undertake proportionality assessments at the moment force is applied, legal compliance becomes exceedingly difficult if not impossible.

IHL does not explicitly ban autonomous weapons. Nevertheless, it presupposes the exercise of qualitative human judgment embedded within decision-making processes, particularly in the application of core principles such as distinction, precautions in attack and proportionality assessments. In addition, States are required under Article 36 of Additional Protocol I to review new weapons, means, and methods of warfare to ensure compliance with IHL. Yet such reviews are undertaken by only some States and vary considerably in their rigor and transparency.

To be clear, a private company’s guardrails cannot replace States’ obligations under Article 36, nor can they serve as a proxy for formal weapons reviews. They may, however, function as complementary ex ante safeguards by embedding legal and ethical constraints directly into the design and deployment architecture of the technology. In this sense, they contribute an important additional layer of precaution to the broader compliance ecosystem, reflecting what we describe as a “responsible-by-design”approach that operationalizes normative, ethical and legal commitments in technical form.

In ethical terms, the current debate between Anthropic and the Pentagon seems to set the bar problematically low. A decade’s worth of research on human-machine interaction in the context of AWS suggests that human moral agency is likely to diminish the more the human is embedded in environments where AI has cognitive and epistemic authority. Recent conflicts in which AI-enabled decision support systems have played a role have shown that this might compromise the possibility for restraint in the use of violence, by undermining both the capacity and willingness of human operators to act ethically. With LLMs, this problem may well significantly intensify. For the foreseeable future, LLMs should likely be nowhere near a targeting decision loop.

While it is laudable that Anthropic has so far held firm in its guardrails, we should expect nothing less for companies that produce AI models that are technically unreliable and have the capacity to produce legally and ethically questionable outcomes on the battlefield. But is this enough or does it merely distract from deeper discussions on more robust safeguards for AI technology in AWS?

Anthropic’s counterpoint to The Pentagon revolves primarily around existing shortcomings in reliability, leaving the door relatively wide open for the proposition that, one day, the technology might be good enough for the uses suggested. In doing so, the dispute draws attention away from the more fundamental and intractable challenges that AI raises, in general, for use in making targeting decisions, especially against human targets. Moreover, we would be amiss not to acknowledge the fact that Amodei’s statement does not, per se, advocate for more stringent international regulation on the matter of fully autonomous weapon systems, rather, the CEO’s position seems to advocate for very limited regulatory intervention (because anything more sweeping might have detrimental economic effects), while we wait and see whether AI systems pose an “imminent and concrete danger”. But have we not already learned the stark lesson that a wait and see approach puts governmental regulatory efforts on a significant backfoot?

Inadvertently, the dispute between Anthropic and the Pentagon elevates the speculative status of AI and positions the companies that produce them as the last arbiters of reason against unreasonable (U.S.) state demands. However, the safety, ethical and legal bar for AI use in contexts of conflicts should be set much higher, a standard that other states, especially in a European context, still (for now) aspire to.

‘Goliath vs Goliath’?

Following the lapse of the 5 p.m. EDT deadline on Friday, the Pentagon seemed to make good on one of its threats and designated Anthropic as a “supply-chain risk.” The designation, usually reserved for foreign adversaries, effectively bars any U.S. military contractors, and associated subcontractors, from doing business with Anthropic. This comes after President Trump ordered all federal agencies to stop using the company’s technology. Nonetheless, as the Wall Street Journal reports, Anthropic’s AI model has allegedly played a role in the strikes against targets in Iran on February 28th, potentially showcasing the complications for the DoD in extracting itself from these ubiquitous systems and processes. Anthropic says it will legally challenge the designation as unprecedented and legally unsound.

Just hours after the news broke of the “supply chain risk” designation, OpenAI announced that it had reached an agreement with the DoD to allow its AI models to be deployed on classified military networks under strict safety conditions, including barring the technologies from being used for domestic mass surveillance and requiring human oversight of any use of force, even in autonomous weapons, reflecting core “red lines” the company said it shares with Anthropic. This adds further confusion to an already bizarre public spectacle. The alleged wording of OpenAI’s contract with the DoD, that “The AI System will not be used to independently direct autonomous weapons in any case where law, regulation, or Department policy requires human control” leaves room for flexible and open interpretations of these “red lines”. . While Anthropic were clearer and firmer in their stance, OpenAI seem to be taking a more strategic stance; publicly posturing themselves as upholding safety, legal and ethical “red lines” while ensuring there is opportunity to exploit the military “gray zones” the DoD claim they cannot avoid.

The fact that OpenAI suggests it was able to secure the guardrails that Anthropic was not offers some indication that this was likely more of a political-ideological battle than a purely policy-driven dispute. It also helps explain the spat: Anthropic couldn’t get the Pentagon to agree to its guardrails, while OpenAI apparently could (or at least a modified version of them), underscoring how much of the conflict may have hinged on politics, pique, and leverage rather than the pure substance of the safeguards themselves. It is possible, however, that the broader contours of the OpenAI deal with the DoD were long in the making (as strongly suggested by the short timeline along which events unfolded) and that we were all witness to a very public intra-industry battle for dominance in the U.S. defense environment.

As we highlighted in our first post, the Pentagon has reportedly insisted that access to advanced AI models must be available for “any lawful use.” From the DoD’s perspective, contractual limitations imposed by private firms may impede operational flexibility and strategic readiness. In an era of intensifying geopolitical competition, military planners may view guardrails as unilateral constraints that adversaries will not observe. But we are worried about this from a different angle: who determines in this context what is “lawful”? The DoD conducts its own legal reviews and asserts compliance with domestic and international law. It may argue, as it has in the recent past in Venezuela, that certain decisions about appropriate uses of force fall within the constitutional authority of the executive branch and are not subject to congressional oversight.

This position becomes far more fraught when the same government claiming authority to define legality is actively breaching international law. Just this weekend, the U.S. and Israel started carrying out military operations against Iran in stark violation of prohibition on the use of force under international law. From a legal perspective, the strikes the U.S. and Israel have carried out implicate illegal acts of aggression. As critics have warned, AI systems are already contributing to real-world harm, because these brittle, error-prone models are being deployed in high-stakes environments where mistakes scale rapidly and accountability remains diffuse. When such systems are integrated into military targeting or operational decision-making, the risk of misidentification, faulty data interpretation, or automation bias can translate directly into civilian injury or death. Research on civilian protection underscores that without stringent legal oversight, transparency, and appropriate levels of human control and judgement, AI-enabled warfare threatens to erode the very safeguards designed to prevent disproportionate and indiscriminate harm.

When a state, and more specifically its executive branch, effectively serves as the judge of its own compliance, and the international community is largely silent in response, the line between lawful and unlawful conduct becomes entirely elastic and politically contingent.

That reality underscores the deeper structural problem: determinations of legality in the AI-enabled use-of-force context cannot rest solely with national authorities, nor should it rest with U.S.-based AI companies. What we have witnessed in the past week is a positioning of U.S. viewpoints as the state–of-the-art in a debate that is far from over and in which the international community seems to be somewhat sidelined. It is a dispute for competitive advantage in which there is a risk that safety, ethical and legal standards will become collateral damage. It elevates both the Pentagon AND the (U.S.) industry to be the ones to hash out the parameters of both proper and de facto use. In line with the DoD’s strategy, it gives prominence to U.S. technology companies. But amidst the spectacle, we must ask: where does that leave the international or technological actors who advocate for more caution? Where does this leave states or companies that would set guardrails more stringently? And perhaps most importantly, what voice does this give to the individuals and communities likely to be on the receiving end of these faulty technologies? What of the rights of civilians in and around the battlefield to be free of unjust injury and death? The current debate offers little in terms of navigating, or even recognizing, the thorny complexities of these competing duties.

Concluding Thoughts: The Fierce Urgency of Now

This dispute and the attention it has attracted underscore the urgent need for internationally binding guardrails and governance frameworks for AWS and AI-DSS, ensuring the rules governing their planning, design, development and deployment are not defined unilaterally by those who stand to benefit the most from more expansive interpretations. Recent events in Venezuela and Iran demonstrate clearly that the use of AI-enabled military tools is not merely a domestic procurement matter, but a foreign policy issue with consequences for the international community as a whole. The ethical and legal thresholds for such technologies, and the values that underpin them, cannot be set solely by one administration or by powerful private companies lacking democratic accountability. Durable and legitimate regulation will require broader international engagement, shared standards, and genuine multilateral buy-in.