11 Oct The Use of AI at the ICC: Should we Have Concerns? Part II
[Gabrielle McIntyre is Chair of Women’s Initiatives for Gender Justice, Co-cordinator of Africa Legal Aid’s Gender Mentoring Programme for International Judges; Independent international law consultant. Nicholas Vialle is a Pro Bono Lawyer (human rights, refugee and migration law), Australia; Independent international human rights law consultant.]
This brings us to the second issue concerning the development of AI which may raise issues of concern at the ICC particularly in relation to commitments to gender equality and non-discrimination, which are at the core of the cultural framework of the ICC and embedded within the Rome Statute’s legal framework. As already noted, AI is largely developed in an entirely homogonous environment made up of predominately white men in work environments where discrimination, including sexual discrimination and harassment, are perpetrated against women with impunity. According to UNESCO only 12% of the artificial intelligence researchers are women and only 6% of professional software development in the field of AI is carried out by women. Further, women working in the technology sector suffer a 28% pay-gap and reportedly leave the field in disproportionate numbers citing gender bias, discrimination and harassment. In October 2018, The New York Times exposed Google’s handling of cases of sexual assault and harassment against women documenting its long track record of ignoring such claims . In the same year, a class action suit was bought by women in technical roles at Microsoft concerning Microsoft’s lack luster handling of complaints of sexual harassment and discrimination. It was even claimed that a female intern was forced to work alongside a man who she alleged had raped her; even after reporting the rape to the police, her supervisor, and human resources.
As these reports suggests, the values of the organizations that the OTP may be partnering with could well be antithetical to the values of the ICC as an organization with zero tolerance for workplace misconduct and an overriding commitment to ensuring gender equality and respect for diversity at the ICC. In that regard, there is an abundance of evidence to demonstrate that AI tools have not only threatened long held goals of gender equality but have the potential to accelerate gender inequality – an issue which should be of principle importance to the ICC. Further, AI tools have also been shown to discriminate on other grounds, including race, further undermining Rome Statute commitments to non-discrimination.
Recalling strategic Goal 3 of the ICC’s strategic plan to “further develop mainstreaming of a gender perspective in all aspects of the Court’s work” there appears to have been no consideration given as to how the development and use of AI tools will respect this goal or how the OTP will ensure that its use and development of AI’s tools does not offend the Rome Statute’s fundamental principle of non-discrimination by ensuring that biases are not embedded in its AI systems.
Unfortunately there are many ways in which bias can find its way into an AI tool: the initial coding of the algorithm; the structure of the data fed into the system to train the AI prior to deployment, which effectively adapts the AI algorithm in machine learning model AI; the methods of quality assurance or refinement of an AI’s outputs; the prompt that is put into the AI tool by the user; and the human who interprets that output. Each of these variables have a valence for bias which could impact an AI’s output. Further, research has clearly shown that gender biases are found in datasets in general and training data sets in particular. Thus, if the data contains certain biases, this will be replicated by the algorithm and can even be exacerbated by it.
Importantly, every database has a point of view. What a researcher finds in the process of searching depends heavily on who builds the algorithm and what choices the programmer makes in that process. In that regard, the impact on AI’s development by the lack of diversity of the developers of big tech companies has been exemplified in many ways. With respect to AI facial recognition technology, the developer’s choice to train on data of white male faces insufficiently representative of the population to be predicted has made it difficult for these AI tools to recognize female faces or faces of color. Studies have shown that darker skinned females are the most misclassified group with error rates up to 35%. As the OTP has indicated its intention to use these tools the potential implications of such errors are not inconsequential. Indeed, in the US facial recognition has been implicated in numerous criminal cases leading to mistaken identity and arrest of the wrong person, predominately the wrong black person. Further, and as this example suggests, the use of AI technologies can result in an inversion of the burden of proof potentially eroding the presumption of innocence.
The same type of structural bias is evident in software that automatically transcribes and translates evidence, another AI tool that is intended for use by the Office of the Prosecutor. For example, Google speech recognition software has been trained on male voices rendering it 70% more likely to accurately recognize male speech than female speech. In terms of automated language translation, research has demonstrated that Google translate systematically changes the gender of translations when they do not sit with stereotypes. For example, “the female President” in German is translated to “the President” in Italian and “the male nurse” in German is translated to the “female nurse” in French. According to Google, this results from the necessity of using English as a bridging language when bi-lingual data does not exist for language pairs – an explanation not unchallenged. Language translation software has also been shown to systematically translate women’s voices to male in male dominated professions while other studies have demonstrated that automated translation engines amplify the bias of their training datasets. For example, a translation engine trained with a dataset where cooking was associated 33% more frequently with women, produced results where cooking was associated 68% more frequently with women.
Yet other studies show AI systems analyzing data and reproducing historical structural discrimination in society. This issue had been highlighted by use of recidivism prediction tools which exhibited bias against black defendants who were far more likely than white defendants to be incorrectly judged to be at a higher risk of recidivism while white defendants were more likely than black defendants to be incorrectly flagged as low risk. The software designers claimed that the discrimination in the data reflected the available data statistics and thus the AI outcomes effectively mirrored existing discrimination in society.
Similar research on text-based machine learning showed that software trained on articles collected from google news adopted sexist views of women, while the machine learning Chatbot, Tay, developed by Microsoft and trained on a diet of Twitter, Quora, and Reddit posts, and based on its further learning from user engagement, referred to feminism as a ‘cult’ and ‘a cancer’ and stated that gender equality equaled feminism within 15 hours of its public release. Further, AI tools for rapid pattern identification machine learning AI based on massive datasets of unlabeled images from the web have been shown to automatically learn gender, racial and intersectional biases from the way people are stereotypically portrayed on the web. Noting the OTP’s intention to partner with big data holders to develop tools for rapid pattern identification potential bias in the data sets used to train AI tools adopted by the OTP could greatly undermine the integrity of AI outputs and entrench gendered and other biases into the work of the ICC.
However, what some of these examples also illustrate is that machine learning AI learns from its users and its algorithm is adapted over time in response to data cleaned from its users. In that respect, AI is said to do no more than hold a mirror up to society – it is not the AI that is inherently bias or discriminatory but the humans who create and use it – each leaving a trace of their own values. In that regard, according to UNESCO not only are men the primary developers of AI but they are also the primary users of it leading to further entrenchment of discrimination against women and other minority groups in AI outputs.
The impact of the user may have particular relevance for the deployment of AI tools at the ICC. For example, the Prosecutor has indicated that AI will be deployed in the process of discovery, and we can imagine a machine learning algorithm being trained on data where the desired outcome is the identification of case relevant evidence. While this may be the original intention of the AI system as the Prosecutor’s focus is the presentation of their own case it is reasonable to assume the Prosecutor will have a natural bias – as opposed to intention – towards finding inculpatory evidence. As such, the algorithm may be asked to search more often and be rewarded more often for correctly identifying inculpatory evidence over exculpatory evidence. As a result, the AI tool will learn overtime to embed the bias of the Prosecutor into its processes and focus on finding inculpatory evidence at the expense of exculpatory evidence which could impact on the fair trial rights of the defence. Moreover, it could do so without anyone actually being cognizance of the changes to the algorithm that had taken place due to the user data.
In that regard, it must also be recalled that AI tools do not understand their outputs – they are not sentient. They are computational processes programmed to be responsive to their users and the output of the AI system will be influenced by what is asked. The impact of the inputs becomes clear when looking at reports that DALL-E 2 was covertly adding words to users prompts to increase the diversity of its output. Thus when considering bias, what prompt is given by the user – and the AI will learn from rewards given by the user – (that is a numerical value that indicates how well the system is performing its task with the system trying to maximize the total reward it receives over time by learning from its actions and feedback)- will impact the AI’s outputs.
Finally, bias can also come in through the humans that interpret the data derived from AI systems. As already noted, AI systems are vulnerable to bias and prone to hallucinations Relevant to criminal law are also automation bias and confirmation bias. Automation bias is believing that because the data came from an algorithm it must be objective which is not necessarily true. This type of bias can have particular significance in the criminal justice process where AI is being used to undertake complex tasks that are not easily amendable to human verification. Confirmation bias involves seeing what you already believe, that is searching for, interpreting, favoring, and recalling information in a way that confirms or supports prior beliefs or values, a potential hazard when it comes to prosecutions, which are generally premised on the prosecutor’s theory of a case. A simple example which exhibits both types of biases is the case of the US lawyer who relied on ChatGPT for legal research and filed before the Court six legal cases as supporting precedents which did not exist. When the lawyer had queried ChatGPT whether the authorities were real, he was assured by ChatGPT that they were and could be accessed on LexisNexis and Westlaw. The lawyer did not verify this for himself but relied on ChatGPT, stating when the falsehood was discovered, that he was unaware that ChatGPT could provide false information – automation bias – but he may also have been influenced by the help the cases identified by ChatGPT gave to his case – confirmation bias.
In conclusion, while we applaud the innovations adopted by the Office of the Prosecutor and recognize both their inevitability and their potential to revolutionize ICC work practices there may be reason to be concerned about the consistency of the use of AI at the ICC with the Rome Statute. As we have set out, the data AI systems rely on may have been obtained in violation of human rights and AI outputs are governed by the application of opaque algorithms that are not easily contestable impugning the principle of equality of arms and the defendant’s right to an adversarial process. Moreover, AI tools are designed by a limited number of predominately white male humans operating in work environments typically discriminatory towards women and whose biases have been shown to be intentionally or unintentionally embedded into the AI systems they have developed in ways antithetical to the values of the ICC. Further, as AI applications are based on data generated by humans, again mainly white male humans, they can amplify discrimination by replicating or reinforcing existing prejudices and inequalities in society undermining the Rome Statute’s commitment to non-discrimination. Moreover, as AI systems are not immune to unreliable outcomes there is concern that automation bias or confirmation bias may unduly influence assessment of AI outputs to the detriment of the fairness of proceedings.
In this context, it is not only critically important for lawyers and Judges to understand how AI is being used by the OTP at the ICC but such development and use must be governed by a clear and robust legal framework that ensures full adherence to international human rights standards, including transparency to facilitate contestability. While we found no evidence of it, it may well be that internally the OTP is well attuned to these issues and is taking appropriate measures to ensure that its development and use of AI is fully consistent with the Rome Statute framework. If that be so, we may well have reason to celebrate the efficiency gains anticipated to be derived from technological advancements in the work practices of the OTP.