Emerging Voices: Computational Analysis of International Law-Using Text-As-Data Tools to Investigate International Investment Agreements
[Wolfgang Alschner (PhD in International Law, JSM (Stanford)) is a post-doctoral researcher at the World Trade Institute in Bern and the Graduate Institute in Geneva specializing in text as data analysis of international law.]
As international law scholars we are overwhelmed with information. The United Nations Treaty Series alone contains more than 50,000 treaties. Add to that the many thousand decisions by international courts and organizations that grow day by day. Just keeping abreast with a sub-field of international law is a full-time job. Not only academics but also beneficiaries of international law are affected by this information overkill. A recent UNCTAD report pointedly concluded that international investment law has become “too big and complex to handle for governments and investors alike”. Lest we are to drown in the rising tides of information and complexity, we need to find novel ways to digest and analyze international law materials.
Computational analysis of international law promises such a new way. Not only do computers not grow tired or grumpy when reading through thousands of documents, but they also find patterns in data that humans would not be able to spot. To be sure, robot lawyers are not going to replace human researchers any time soon – nor should they. But the interaction between computers crunching numbers and scholars interpreting results does provide new and exciting opportunities to tackle international law’s big data problems. In this post, I will highlight four examples derived from computational international investment law research that I did together with Dmitriy Skougarevskiy, which showcases some of the insights revealed through computer-assisted approaches that would have been difficult or impossible to gain using traditional human-led research.
Dmitriy and I have investigated over 2,100 International Investment Agreements (IIAs) and their 24,000 constituent articles. Using a computational approach similar to what is being employed in plagiarism detection software, we were able to empirically demonstrate four hitherto unknown or only anecdotally presumed aspects of the IIA universe relating to asymmetry in negotiations, the evolution of national investment treaty programs, the diffusion of treaty design and the innovations achieved in recent mega-regional agreements. To allow researchers and other stakeholders to engage with our findings directly and interactively, we have created the open-access website www.mappinginvestmenttreaties.com.
The simple, yet powerful text-as-data procedure we employ in our research consists of four steps. First, we collect treaty full texts and split them into their constituent articles. Second, we represent each treaty and article based on its consecutive 5-character components. The phrase “shall be permitted” is thus represented as “shall”, “hall_”, “all_b”, “ll_be”, “l_be_”, “_be_p”, “be_pe”, “e_per”, “_perm”, “permi”, “ermit”, “rmitt”, “mitte”, “itted” (“_” signifies space). Third, we compare the textual similarity between two treaties or articles based on the 5-character components they have in common calculating what is formally known as a Jaccard distance – a measure of dissimilarity ranging from 0 (100% similarity) to 1 (0% similarity). The phrase “shall be permitted” and a second phrase “shall not be permitted”, for instance, are identical, except for the 5-character components “all_n”, “ll_no”, “l_not”, “_not_”, “not_b”, “ot_be”, “t_be_” due to the word “not” in the second phrase, which yields a Jaccard distance of 0.48. Finally, since Jaccard distances by themselves do not tell us much, we compare Jaccard scores across sets of documents. Such comparison allows us to see where treaty language convergences or diverges uncovering latent patterns in our data – four of which we will present here.
First of all, our metric revealed a stark asymmetry in investment treaty making. While rich countries achieve highly consistent treaty networks whose design closely corresponds to the model template they employ, poorer states are party to patchworks of textual diverse treaties. Put differently, a computational assessment of textual similarity allows us to empirically show in a systematic, objective and replicable manner that developed countries tend to be the system’s rule-makers while developing countries are its rule-takers.
Second, Jaccard distances also shed light on consistency and innovation in national investment treaty programs. Some countries like the United Kingdom have only made cosmetic changes to their investment agreements over time. The country’s network of 110 bilateral investment treaties (BITs) concluded between 1975 and 2009 is thus the most consistent of the world. Other states have continuously updated their investment treaties. Our metric allows us to detect major changes in treaty design such as when the United States revamped its model agreement in 2004. Also less well-known innovations, such as the Finish shift to a pre-establishment treaty model in 1999 that combines investment protection with capital liberalization, are made visible. Our metric thus provides a means to inductively investigate the evolution of national treaty programs.
Third, our approach enables us to trace treaty design diffusion. We observe that some countries copied and pasted almost entire treaties from third states. Israel, for instance, heavily drew from British BITs when devising its own BIT program. Hungary, Czech Republic and Slovakia, in turn, used the BITs they concluded with each other in January 1993 as templates for their subsequent treaty negotiations resulting in strikingly similar agreements. Diffusion also happens on the clause level. We discovered, for instance, that the language of a public policy exception first appearing in Article 11 of the 1985 BIT between Singapore and China later diffused to India, Mauritius and half a dozen African countries. What makes the clause special is that it was conceived and is exclusively being used by developing countries making it one of the rare treaty design innovations in investment law that is indigenous to the Southern hemisphere.
Fourth, the approach we developed allows us to assess the novelty of newly concluded agreements. The Transpacific Partnership (TPP), for instance, was initially heralded as a “new and high standards agreement”. Our metric reveals how new it actually is and how high the standards are that it sets. We found that 81% of the text of the TPP investment chapter is taken verbatim from the 2006 USA-Colombia Free Trade Agreement. The remaining 19% are mostly used to clarify and further refine already existing standards. Hence, while it is true that the TPP investment chapter sets higher standards as compared to some of the earlier BITs with which it overlaps, it is very much a continuation of prior US practice rather than an IIA 2.0.
Computational analysis of international law thus provides an efficient and effective way to investigate the hidden structures of the international investment law universe revealing new and surprising insights. At the same time, the presented research offers only a glimpse of the multitude of opportunities that computational international law still holds in store. As computers turn the flood of legal information from a burden into a resource, hitherto impossible research avenues are opening up from the quantification of international law’s fragmentation to the investigation of state practice and opinion juris in 195 countries. Exciting times lie ahead.