Symposium on Military AI and the Law of Armed Conflict: The ‘Need’ for Speed – The Cost of Unregulated AI Decision-Support Systems to Civilians

Symposium on Military AI and the Law of Armed Conflict: The ‘Need’ for Speed – The Cost of Unregulated AI Decision-Support Systems to Civilians

[Marta Bo is a Senior Researcher at the T.M.C. Asser Institute and an Associate Senior Researcher at the Stockholm International Peace Research Institute (SIPRI).]

[Jessica Dorsey is an Assistant Professor of International and European Law at Utrecht University and Managing Editor of Opinio Juris.]

Against the backdrop of Israel’s military campaign in Gaza, noted as the most destructive in a century, and other ongoing conflicts in Ukraine, Yemen, Iraq and Syria, this contribution is prompted by one element common to these: the reported military use of AI-enabled decision-support systems (AI-DSS). Klonowska defines AI-DSS as “tools that use AI techniques to analyse data, provide actionable recommendations, and ‘assist decision-makers situated at different levels in the chain of command to solve semi-structured and unstructured decision tasks’.” These AI-based systems can be used for the processing of data (such as human intelligence (HUMINT), drone footage, intercepted communication and, in some cases, gathered through  other IS[TA]R capabilities in real time), such as patterns of behaviour of individuals feeding into the military decision-making process with the production, identification and even nomination of targets at speed.

The reported use by the Israel Defense Forces (IDF) of the Gospel and its companion systems, Fire Factory, Depth of Wisdom, Alchemist, and Lavender, within the targeting cycle demonstrates a growing reliance on AI-DSS for military operations. Along with the development, testing and potential deployment of other AI-DSS for targeting, such as Project Maven, the US’ Algorithmic Warfare Cross-Functional Team, the integrated Gorgon Stare and Agile Condor systems on MQ-9 Reaper drones or other systems such as Palantir’s Artificial Intelligence Platform (AIP) Defense and MetaConstellation which are being used in Ukraine, shows that this is the way states (and tech companies like Microsoft, Google, Amazon and ClearviewAI driving this development) are choosing to go.

In a recent post, Renic and Schwarz made a number of salient points about how the increased speed and scale of target production through military AI erodes moral restraints in war. In this post, we echo the problematisation of speed and scale of AI-enabled targeting and complement moral issues raised by pointing to several legal challenges that make the ‘unregulation’ of AI-DSS problematic and warrant increased attention.

The Speed and Scale of AI-Enabled Targeting Leave Little Room for Human Judgement

Speed is an essential feature helping to explain why AI-DSS are on the rise. As scholars illustrate: “the aim is to outpace the opponent’s OODA-loop (i.e., Observe, Orient, Decide, Act) and AI-based automation can be an important driver of such efficiency gain.” In Gaza, it has been reported that speed has played a prominent role in the targeting of Hamas leaders. +972 Magazine quoted an Israeli official describing the process: “We work quickly and there is no time to delve deep into the target. The view is that we are judged according to how many targets we manage to generate.” Another source revealed that there are “cases in which we shell based on a wide cellular pinpointing of where the target is, killing civilians. This is often done to save time, instead of doing a little more work to get a more accurate pinpointing.” Mimran points out in terms of human reviews of targets: “In the face of this kind of acceleration, those reviews become more and more constrained in terms of what kind of judgment people can actually exercise.” 

Another publication by +972 Magazine from 3 April 2024 reports that another AI-enabled system used by the IDF, called Lavender, and another tracking system, with the insidious name of “Where’s Daddy?” have been used in Gaza to generate and track human targets, and carry out the attacks once they entered their family homes, at speed and scale. According to one intelligence officer quoted in the report:

“We were not interested in killing [Hamas] operatives only when they were in a military building or engaged in a military activity. On the contrary, the IDF bombed them in homes without hesitation, as a first option. It’s much easier to bomb a family’s home. The system is built to look for them in these situations.”

Another mentions:

“It was like that with all the junior targets…The only question was, is it possible to attack the building in terms of collateral damage? Because we usually carried out the attacks with dumb bombs, and that meant literally destroying the whole house on top of its occupants. But even if an attack is averted, you don’t care — you immediately move on to the next target. Because of the system, the targets never end. You have another 36,000 waiting.”

The six-step process regarding the way Lavender and “Where’s Daddy” were integrated in the targeting process that yielded devastating effects on the civilian population in Gaza the report outlines includes:

  1. Target generation that displaces humans by default from the decision-making process;
  2. Linking targets to family homes, resulting in significantly higher civilian casualties by design;
  3. Choice of weapons, once targets were selected often opting for “dumb bombs” in densely populated areas for lower-level targets instead of more expensive, more precise munitions (in a “munitions economy” framework);
  4. Authorization of up to 20 civilian casualties accepted per low-level Hamas target, not performing requisite weighing of civilian casualties versus anticipated military advantage as prescribed by IHL (as one officer noted: “In practice, the principle of proportionality did not exist”);
  5. Despite indications that that acceptance rate has been lowered over time, there are still policies of a civilian casualty acceptance rate of hundreds of civilians per any high-level commanders targeted;
  6. Relying on automated target location indications without a human verifying of the presence of the intended target.

The customary IHL principle of precautions in attack mandates the positive obligation for those planning an attack ‘to do everything feasible to verify’ the military nature of individuals or objectives. This obligation is crucial to ensure compliance with the customary IHL principle of distinction, which mandates military personnel ‘at all times distinguish between the civilian population and combatants and between civilian objects and military objectives and accordingly shall direct their operations only against military objectives.’ This principle entails two key points: the requirement to direct an attack at a specific military objective (Art. 51(4)(a) API) and the prohibition against treating a wide area as a singular objective (Art. 51(5)(a) API).  Unfortunately, it seems that both points are being ignored.

The precautionary principle also entails an overarching obligation of constant care in the conduct of military operations, which states have recognised as an ongoing obligation throughout the conduct of hostilities to spare civilians and civilian objects to the greatest extent possible. In a scenario where military personnel are unable to “delve deep into the target” and strategic and operational objectives are not defined around sparing civilians to the greatest extent possible but rather on generating as many targets as possible, it is difficult to see how these systems contribute to compliance with the law. In fact, even if AI-DSS are meant to be used as ‘human in the loop’ systems (i.e., the recommendation about who or what to target is sent to a human decision-maker), the speed and scale of production or nomination of targets coupled with the complexity of data processing may make human judgment impossible or, de facto, meaningless. Whether or not an AI-DSS is employed, the sheer speed of decision-making raises concerns in and of itself about adherence to the precautionary principle. Put simply, making decisions rapidly can sometimes outpace our ability to take the necessary precautions in preventing or minimising harm to civilians and ensuring constant care.

Regarding scale, the +972 article reports that in the past the IDF would generate 50 targets per year, whereas the Gospel system is able to generate upwards of 100 per day. The large volume of targets produced increases the likelihood of more strikes, largely because of the cognitive action bias. This phenomenon refers to the human tendency to take action, even when inaction would logically result in a better outcome.

Further on biases, as the authors noted here and here, the cognitive phenomenon of automation bias could lead to over-trusting the AI-driven recommendations about targets. Automation bias refers to humans’ tendency to trust decisions made by machines without critically questioning the outcomes, even when presented with contradictory information. And speed only exacerbates this phenomenon. As Klonowska points out, “the speed and volume of target recommendations introduce a climate of risk where recommendations are not to be ignored.” Moreover, when humans encounter highly complex situations, our cognitive limitations often prompt us to opt for the path of least resistance and outsource our decision-making to automated systems. Research has demonstrated that in situations characterised by high cognitive complexity, pressure, stress, and time constraints, humans are more likely to defer to the judgment of machines. While a human theoretically remains in the loop, there are uncertainties regarding the extent to which humans truly maintain meaningful control or exercise judgment within these military decision-making processes.  

AI-DSS’s Error Rates, Accuracy Issues and Risks

In addition to speed and scale, the incremental use of AI-DSS hinges on two key assumptions: decision-advantage and accuracy. Both require closer examination. One stated goal is for AI to increase decision-making speed on the battlefield in pursuit of “decision advantage” (processing large amounts of information quickly to be able to act first) in terms of shortening the OODA loop. However, critics of this perspective suggest that the OODA loop model was only designed for application in certain specific tactical settings (such as air-to-air combat), and does not translate well to, for example, the urban combat domain.

Therefore, the ambition of shortening the OODA loop might not be fit for purpose in complex urban settings, like Gaza, and carries the inherent risk of making lethal mistakes for which civilians bear the ultimate costs. Not rushing to decisions may offer better battlefield outcomes. As Stewart and Hinds explain, employing techniques such as tactical patience throughout the military decision making process offers advantages such as being able to see more, understand more and develop more options. In taking such an approach, military operators are better equipped to understand the civilian harm implications of their operations, and this helps minimise civilian harm to the greatest extent possible, in compliance with IHL obligations. This is even more important when operations are happening in densely populated areas, especially considering concerns under the precautionary principle raised above.

The second aspect we focus on is that of AI-DSS’s performance, including their accuracy. AI systems have some inherent error rates. Error rates in AI systems should be understood as the system’s inability to recognise what should be targeted (positive IDs) and what should not be targeted (negative IDs). AI-DSS can err in both. The systems being used by the IDF are reported to be accurate only 90% of the time; this translates to knowing in advance that 1 out of 10 individuals or objects will not be legitimate military targets. Moreover, AI systems are prone to a new kind of error that humans would not necessarily make, which is linked to their susceptibility to adversarial attacks wherein small changes undetectable to humans can alter the system’s output and lead to errors. 

Finally, studies have shown that AI systems have (sometimes severe) limitations such as gender and skin-type bias, and the susceptibility to failures in recognising images. Open and crucial questions remain as to target identification and nomination, such as: how are target profiles developed and input? How explainable are the algorithms within these systems, and must they reach a minimum required level of explainability before they can be used? Can AI-DSS interpret behaviour? Answers to these questions would help bolster transparency, accountability, and ultimately legitimacy of military targeting operations augmented by AI-DSS. Thus far, unfortunately, these answers remain elusive.

General understanding about AI suggests that these systems are incapable of human reasoning and therefore the concerns that arise in a complex information environment like urban warfare are all the more relevant. Moreover, poor training or lack of training data about negative IDs could result in an AI system’s biased training and ultimately produce errors. The risk of misidentification when associated with AI-DSS target identification and nomination carries with it extremely high costs, including increased harm experienced by civilians residing in areas where AI-DSS are deployed.

Additionally, there are some error rates that stem from the unpredictability of the environment. Warfare, generally, but urban warfare more specifically, is a complex and dynamic scenario where the quality of available information is at times poor, yielding an occasionally erratic and unpredictable operational environment. This unpredictability can negatively impact the reliability and accuracy of AI-based systems developed and tested often within highly linear datasets.

The ‘Unregulation’ of AI-DSS

As illustrated above, AI-DSS are, in principle, used to assist human operators. As a basic technical starting point, algorithms (the mathematical methods used to devise a way to carry out a task) within these systems are implemented in programs (the instructions written in programming language for the task) which make up software (a set of programs used together with gathered data to execute an application). As such, AI-DSS fall into the category of dual-use technologies, making regulation difficult. To give a practical example: the same facial-recognition software we may use to unlock our mobile phones is being used within a targeting process to “find, fix and finish” individuals in a conflict zone, at times with little to no human intervention. Consequently, while IHL applies to the use of these systems, there is potential for ambiguity regarding their current and future regulation under IHL and international law.

For the past several years, the scholarly and policy debate on military AI has predominantly centred on autonomous weapons systems (AWS) and the potential threats they pose to civilians due to their autonomy in ‘critical’ functions (see the work of the Group of Governmental Experts (GGE) established under the framework of the 1980 Convention on Certain Conventional Weapons). The risks associated with AI-DSS have received comparatively less attention. This is likely due to the fact that AI-DSS retain a form of human-machine interaction and therefore give the (sometimes incorrect) impression of only assisting rather than replacing the role of humans in targeting processes. In short, humans would retain the ultimate responsibility for targeting decisions. However, when assessed against the realities of the speed, scale, and complexity of how these systems work operationally, as discussed above, this assumption might prove unrealistic.

Conclusion

States are increasingly opting to integrate AI-DSS into targeting decision processes. However, we must scrutinise how a perceived “need” for speed and scale is influencing the current integration of AI-DSS on the battlefield. It’s crucial to understand the actual risks involved and assess whether they outweigh any potential opportunities.

All the issues we have outlined raise questions about legal compliance, specifically related to the duty to take feasible precautions, crucial to ensure compliance with the rules of distinction and proportionality and ultimately aimed at minimising civilian harm. In situations like those described in Gaza—such as shelling more widely to save time or failing to verify target recommendations due to time constraints, potentially resulting in more civilian harm—it is very difficult to see how military personnel, assisted by these AI-DSS, are presently complying with their legal obligations, or whether under these circumstances legal compliance is possible at all. Linked to legal compliance are important concerns raised by some scholars around AI-DSS’s implications for the way legal assessments are conducted and the way we understand reasonableness in targeting decisions. We support the further examination of these areas as well. 

Moreover, arguably the ‘unregulation’ of AI-DSS has contributed to their legitimisation and normalisation in warfare. As mentioned, the GGE focuses on (L)AWS. While an expansion of the scope of the GGE on LAWS to include AI-DSS is not advisable (as it may thwart regulatory efforts and progress made within various fora), states should broaden their focus in regulatory discussions beyond just (L)AWS. For example, in line with the approach adopted within the Human Rights Council, states need to expand discussions to include AI-DSS within the UN General Assembly First Committee on Disarmament and International Security, a forum that could potentially assume a leading role in drafting a regulatory framework. Another forum that could bring this issue more prominently to the fore is the upcoming Responsible Use of AI in the Military Domain (REAIM) Summit in South Korea, taking place in September this year.

Furthermore, while AI-DSS as decision-support systems and dual use technologies do not squarely fall under the IHL regulation of ‘weapons, means and methods of warfare’, their implications are in some cases comparable (see for example when they constitute offensive capabilities by way of supporting decisions that contribute to the engagement in hostilities). In other cases, AI-DSS could be simply reprogrammed and be used in semi-autonomous modes.

Given the arguments we’ve outlined above, it is important to carefully consider how these systems affect the conduct of hostilities, including through legal reviews, and to recognise the humanitarian consequences they pose for civilians through their use and experimentation. Regulation is essential in this regard. After all, as Rosen pointed out, “if AI speeds up killing in war, decision-makers must slow it down.”

Print Friendly, PDF & Email
Topics
Artificial Intelligence, Featured, General, Symposia, Technology, Themes
No Comments

Sorry, the comment form is closed at this time.