26 Apr Expedient or Reckless? Reconciling Opposing Accounts of the IDF’s Use of AI in Gaza
[Christopher Elliott is a war crimes researcher based in Canada. He has a Master of Anthropology from the Australian National University and a Doctor of Philosophy from King’s College London.]
As the Israel-Hamas war drags through its sixth month, one topic of immense and continuing controversy is the use of AI tools by the IDF as part of its military targeting process. Among these tools, two of them – the Lavender and Gospel systems – have become particularly controversial because of their prominent role in a devastating series of media reports by the Israel-based +972 Mag and The Guardian.
The Allegations
Succinctly put, the newspapers allege that in the initial weeks of the war, Lavender and Gospel were employed as cold enablers of mass death – the centrepieces of an overly permissive air campaign that contained “an element of revenge”. Citing numerous members of the Israeli intelligence community, the newspapers allege that the tools were used by Israeli targeting cells to automatically “generate” actionable targets at previously unprecedented rates, thereby facilitating the creation of a digitized “mass assassination factory”.
Gospel achieved this, the sources allege, by processing enormous amounts of data (too much for a single human to sift through) in order to track down “relatively junior” Hamas or Islamic Jihad operatives in their family homes.
Lavender, meanwhile, is used to make functional characterization assessments (to use a military term of art) about individuals – that is, the machine aggregates a bunch of data about a person and then spits out algorithm-informed suppositions about who is and who is not a member of Hamas.
Based on these reports, many lawyers, ethicists and activists have vocalized fierce opposition to such indiscriminate practices, with Renic and Schwarz observing that, as described, “target expansion, not refinement, is the point and outcome of such systems.” More philosophically, but no less important from an ultimate-fate-of-humanity perspective, Agenjo argues that such depersonalization in the crucible of war amounts to the “oblivion of human dignity”.
Lavender, in particular, has been labelled a “Crimes Against Humanity Machine” – a system whose information-processing efficiencies are enabling a state policy of spree killings from the air.
The Israeli Response
Perhaps inevitably, the IDF is defensive of its employment of AI tools and rejects “outright” the claim that they form part of a “policy to kill tens of thousands of people” with maximum efficiency. In a response to The Guardian, the IDF also sought to correct the newspapers’ characterization of Lavender, saying:
‘The [Lavender] ‘system’… is not a system, but simply a database whose purpose is to cross-reference intelligence sources, in order to produce up-to-date layers of information on the military operatives of terrorist organizations. This is not a list of confirmed military operatives eligible to attack.’
The IDF does not appear to deny that it is leveraging digital technology to increase “the productions of targets large-scale” (as an IDF public affairs statement put it). But consistent with the all-pervasive message of its spokespeople, the IDF argues that its targeting staff “work[] day and night to quickly close circles”.
In step with the IDF, the Israeli academics Tal Mimran and Gal Dahan also claim that the use of tools like Lavender and Gospel is only “a very preliminary” part of the IDF’s targeting process. Afterwards, they say, more careful analysis occurs when legal advisors and “more superior intelligence officers” receive the AI-generated recommendations in a “target room”. Mimran also claims that the machine-proposed targets are at times rejected and returned by the supervising lawyers and target engagement authority (i.e, the commander).
Reconciling the Accounts
Despite serious disagreements over framing, the two opposing accounts are not necessarily irreconcilable. Rather, on key aspects – such as the issue of what these AI tools actually do – the parties actually agree.
Based on the IDF’s technical description, Lavender appears to be an intelligence repository which allows users to visualize persons of interest (potential targets) in a viewable “layer”. Perhaps the layer is a map layer (like when you hit the “Coffee” button in your Google Maps app to search for nearby cafés) or a network diagram depicting people and their connections (as you might see on an office wall in a detective show). It is not made explicit. Either way, automation comes into play when the machine populates the layer with data. In Lavender’s case, the data is people. The compilation of this data appears to happen without active human prompting, but it is guided by pre-set parameters which describe what the targeteers are interested in.
Lavender, then, is an example of “supervised classification” where the human creates a model declaring that “a Hamas member has X, Y and Z attributes” and the machine finds Palestinians who fit into those classes before plonking their profiles onto a viewable layer. As with all such models, the explanatory power is limited by the chosen classifiers. For example, if an indiscriminate net is cast, the model might define the X attribute as “fighting aged”, the Y attribute as “male” and the Z attribute as “connected to somebody in Hamas’ political, military or administrative apparatus”. Such a low-resolution model will not tell the analyst a great deal about the person’s targetability. Certainly, to kill someone based solely on that output would be reckless and undeniably criminal.
According to the newspapers’ intelligence sources, Lavender’s classifiers are set by the IDF’s cyber-intelligence organization, Unit 8200. Those sources also allege that “during the first weeks of the war” 8200 “tweaked” the search parameters to open up the aperture. We do not know what those parameters are, but four newspapers’ sources say that the new model “generated” 37,000 Palestinian men onto the visualized layer as “recommendations” – that is, potential targets.
Gospel appears to work similarly, but for buildings. Imagine, for argument’s sake, that the IDF has in its digital intelligence holdings a sheaf of tagged, machine-readable reports about a particular militant. In some of these reports the same location might be mentioned again and again. These reports might be satellite images of a building, cell phone data (perhaps fixated on a mobile handset of interest) or reports arising from handler communications with covert human sources. As an information aggregator, Gospel appears to be able to say “hey, if you’re interested in this guy, you might be interested in this place”. It is wholly probable that in Gaza, that place would correspond with the militant’s family home.
Used correctly, these tools do not make for necessarily immoral technologies. Indeed, if the IDF employed enough intelligence analysts to do further analysis on each of the locations recommended by Gospel and each of the 37,000 individuals recommended by Lavender, the layers might make for a useful start state in targeting work – like a user-tailored, specialty search engine. But data aggregation is not “analysis”. And herein lies the ethical and legal issue: what happens after the machine has had a first look? Is there second-line analysis and vetting before identified targets are sent to the effects team that handles the strike?
The accounts differ but they are at least partially consonant.
The newspapers’ intelligence sources allege that humans are in the loop only in the sense that they are a “rubber stamp” who avers that Lavender’s “recommendations” are valid targets and ought to be actioned. They allege that in practice the analyst spends about “20 seconds” deliberating before inputting their stamp of approval. As others have observed, it is difficult to see how an intelligence analyst’s good judgment can be appropriately exercised in a work environment driven by such a need for speed.
The IDF does not explicitly deny the 20 second workflow for analyst review but emphasizes that its targeting processes rely on multiple sources of information; that a Lavender recommendation is not a “confirmed” militant; and that (as we frequently hear from the Israeli government) the IDF’s procedures are fully compliant with its international humanitarian law obligations.
A Breakdown in the Operational Planning Process?
Both parties agree then that a human is technically in the loop. But are the newspaper intelligence sources correct when they claim that the timeline for review was recklessly compressed during the first weeks of the Gaza War?
The IDF’s own statements certainly suggest that its first wave of targeting decisions were done in an environment of great haste. 20 hours after the commencement of its air campaign on October 8th, the Israeli Air Force announced that 2,000 munitions had been dropped on 800 “targets”. 100 hours in, that number had ballooned to 6,000 bombs dropped on 3,600 “targets”. That is, targets were being nominated for strikes at a rate of approximately 36 targets per hour.
Some targets were likely already “validated” as known Hamas locations (for example, recurring rocket launch sites). But proper precaution would demand that Israeli targeteers conduct robust and up-to-date “collateral damage estimates” (CDE) before prosecuting a previously examined target.
In their defence of IDF targeting processes, Mimran and Dahan suggest that another AI tool used for pre-authorized targets is called “Fire Factory” – which they describe as “an amalgamate of phase 2 (target development) and phase 3 (capabilities analysis) of the targeting cycle”. For military people with professional familiarity of the targeting cycle, this should read as an immediate red flag. Indeed, what such an “amalgamate” would actually represent is a collapse of a very important firewall that is supposed to exist between intelligence staff (whose “sense” function vests them with target development responsibilities) and operations staff (whose “act” responsibilities involve selecting and employing the appropriate munition for a strike). In a properly functioning military organization, the two sides should be clearly distinct, if not separate, mostly to ensure that operations staff are not “situating the estimate” – that is, to ensure that the killers are not also deciding who should be killed with an air-launched munition. If in the first weeks of the war it is true that pre-validated targets were fed straight into Fire Factory for an “amalgamate[d]” Stage 2/3 check, that would represent a serious breakdown in any reasonable operational planning process – one which carries inherent significant risks for massive numbers of civilians.
The facts of tempo in the IDF’s public statements also raise questions about manpower availability and the mathematics of legal due diligence. The number of lawyers assigned to the IDF’s Southern Command (the headquarters responsible for prosecuting the war in Gaza) appears to be close-held information but Mimran claims that a maximum of 50 lawyers on a rotating watch is a “close” estimate of current force composition.
If we assume that in the war’s first week 1.) some of these lawyers were not already mobilized; 2.) some were otherwise occupied with non-targeting duties; and 3.) the remainder were split between, say, 12-hour shifts, this leaves a (generous) estimate of perhaps 10 lawyers on-station doing targeting determinations at any one time for the referent period. (Perhaps the IDF will one day publish the number of Gaza-focussed legal advisors during the month of October 2023).
Given the speed at which these targets were being validated (36 per hour), it seems implausible that such a small workforce could sustain diligent review. Even putting aside questions about the reliability of the underlying intelligence reporting on October 7th (recalling that the Israeli national security apparatus collectively failed to detect and respond to Hamas’ attack plans), it is difficult to believe that these advisers could make well-informed, dispassionate legal determinations at that pace in a social milieu of nation-wide trauma and fury. Either way, important steps were obviously skipped – all of it backdropped by a zeitgeist of rapid retaliation.
Reconciling the two accounts is somewhat straightforward then. The Israelis correctly describe what the machines are designed to do (automated collation of intelligence information). They also describe how the targeting process is supposed to work, according to US co-opted targeting doctrine (which is cited in the Mimran and Dahan article).
The newspapers’ intelligence sources, meanwhile, are describing how these tools are actually used in the wild. They became recommendation tools whose suppositions are promptly green-lit by target engagement authorities without much deliberation at all.
Thus, a plausible reconciled hypothesis is as follows: before October 7th , AI tools like Lavender, Gospel and Fire Factory were created to make a targeting cell’s life easier. The tools were designed to achieve this by creating fancy layers that visualize data that meets fungible search parameters. These tools were designed to aid but not replace the human-in-the-loop, by enabling better intelligence analysis and statistically useful collateral damage estimates. In the white-flashing heat of an unexpected and spectacularly violent war though, the tools instead became more than that. They became a validation tool. In practice, they transformed into a digital check-in-the-box used to channel the outwardly aimed rage of the Israeli masses. In so doing, they became an accelerant for Israel’s collective revenge against Gaza’s civilian population.
As an academic with diverse and extensive prior military experience, I make it a point not to be overly critical of analysis in public discourse of targeting procedures or operations. Not once in my career as a judge advocate (the last 7 of 22 total years in the military) did I refer to or even account for characterizations in public discourse of targeting processes when developing legal advice for an organizational (military) client. If I spent all my time now as an academic refuting every mischaracterization of military targeting procedures I encounter in the public domain, I’m not sure this would leave time for anything else. Since it doesn’t really matter that much for actual military operators, I often ask myself: why bother? Forget it, I tell myself. Just ignore it and move on. Sometimes, though, ignoring egregious mischaracterizations of military targeting processes in public discourse is just not possible. Often, though not always, such an occasion emerges in response to characterizations developed by a publicist who also has at least some military experience. This is one such occasion. Though a bio (external to OJ) indicates the author has “previously served in the Australian Army in infantry roles,” there are… Read more »
I thank Cox for his bullet-pointed critiques. I will respond to each of his substantive points, in the order he raises them: – It is not disputed that what Cox calls “target selection” (the correct nomenclature is “target development and prioritization”) and CDE are different parts of the targeting process. As my article stated, finding and selecting the target belongs to Step 2 of the process (an intelligence-heavy stage) while CDE belongs to Step 3 (operations heavy). Cox misreads my article and the underlying media reporting when he claims that the discussed IDF AI tools “are purportedly involved in identifying targets, not with evaluating the degree of incidental damage anticipated from an attack“. This appears to be true of Lavender and Gospel (which are Step 2 intelligence tools) but it is not true of Fire Factory. As described by Mimran, this tool appears to amalgamate Step 2 intelligence functions with Step 3 operations functions related to ammunition selection (presumably with yield and blast radius considerations in mind). This is exactly the point where CDE was introduced in my analysis: because we had reached a point at which operational effects and weaponeering decisions were under discussion. I do not resile from… Read more »
As an appendix to the above discussion, it is also worth giving consideration to the views articulated by Todd Huntley (himself a JAG with considerable operational targeting experience).
As he describes the role of the legal advisor in the targeting process, the “[requisite] level of understanding [about a strike’s lawfulness] cannot be obtained if the JAG enters the process moments before a strike. The JAG must attend the planning meetings and sit on the operations center floor. If a JAG is sitting in their office waiting for questions to come in, that JAG has failed”.
Compare Huntley’s preferred procedure with the “best case scenario” proposed by Cox. I find Huntley’s approach more appropriate for the very serious matter that is kinetic military targeting.
Source: https://www.lawfaremedia.org/article/airstrikes-civilian-casualties-and-role-jags-targeting-process
Thank you for presenting the points of clarification based on my comment, Chris. I have a few points to raise in response to your clarifications, which I will do here. Since we’re engaged in a mutual exchange based on your post, I’ll refer my observations to you directly rather than making general observations. – On CDE. Your engagement with the topic of CDE in the main post suggests you have some general familiarity with CDE methodology (CDEM), but little (if any) direct and in-depth knowledge and experience with the subject. Your clarifying comment about a soldier preparing to throw a grenade performing a “‘field CDE’ of sorts” further supports the conclusion regarding your level of knowledge with CDEM. To demonstrate why that is, please allow me to draw directly from US military doctrine for the explanation of the proper use of field CDE. In the process, two caveats are in order. One, this excerpt from US military doctrine is from an instruction that has since been superseded – and the update is not available to the general public. However, the definition and treatment of formal and field CDE is substantially similar in the update. Second caveat – obviously, the IDF… Read more »
Final points: – My comment that a grenade-throwing soldier performs their own “‘field CDE’ of sorts” was my way of conceding that some operational acts do not require a formal, fully-developed estimate of collateral damage, depending on the exigency of the circumstances. The use of the informal “of sorts” was off-hand and comparative – part of a broader observation that weapons users at all levels (be it a soldier, a JTAC or a target engagement authority authorizing the use of a GBU43-B MOAB) have an obligation – derived from a common source of law – to consider the impact of their actions on civilians. It was not meant to be taken as a literal description of infantry doctrine: I am of course aware that a soldier throwing a grenade does not have the technology available to a JTAC (or an identical process for discriminating between targetable and non-targetable objects). In any case, I regret this throw-away comparison because it is a distraction from the actual issue. Where CDE is concerned, the real question is as follows: were the targets struck during the vital 100 hours “dynamic” (requiring Field CDE) or “deliberate” targets (requiring Formal CDE)? My view is that the… Read more »