UN member states are meeting in Geneva on September 24 to continue discussions of weapon systems powered by artificial intelligence, known by the acronym of LAWS—lethal autonomous weapons systems—or “killer robots,” as preferred by campaigners and activists. Delegates will come together to negotiate a draft paper outlining aspects of a possible normative and operational framework addressing these weapons systems.
One of the many points of contention is a proposed reference to “social biases, including gender and racial bias” as a key consideration. While a number of states and civil society organizations welcomed the reference, others expressed opposition to it, citing, among other reasons, the need for further research to assess if artificial intelligence (AI) systems are biased or not.
This is surprising, given the growing literature documenting and analyzing examples of gender and racial biases in AI, which have shown that algorithmic models do this in at least two ways. First, data-based systems reproduce existing inequalities. A 2016 study of a computer program designed to evaluate the potential for recidivism for the criminal justice system found that Black minorities in the United States were twice as likely to be categorized as high risk. The analysis included another less-mentioned detail: the system unevenly predicts recidivism among genders, which makes women appear to be at higher risk than they are.
Second, existing data sets and algorithms skew toward white males, meaning that women of color, for example, are significantly less likely to be intelligible to machine learning programs trained to recognize images and voices. When translating between languages, machine learning models are more likely to use the pronoun “he” as opposed to “she” for gender-neutral terms; women, moreover, are less likely to be shown ads for higher-paying employment positions.
Such occurrences of bias are not one-off events. A review of publicly available information on 133 biased AI systems, deployed across different economic sectors from 1988 to 2021, found that 44.2 percent (59 systems) exhibited gender bias and that 25.7 percent (34 systems) exhibited both gender and racial biases.
While we can easily point to evidence of bias in civilian applications of AI, less research exists on military applications of AI and how they may draw on and reproduce inequalities. Nonetheless, it is possible to extrapolate from the existing models. Consider, for example, a machine translation program used by military intelligence, which would change the gender of a person from an unspecified category to a male. Or an algorithm designed to recruit military personnel that might pass overqualified female candidates given their historically low levels of participation in the armed forces. Or a voice control system that does not recognize the voice of a female pilot. Or an automated system designed to provide emergency relief that does not include provisions specific to women and girls.
On top of these considerations, one must contemplate the potential consequences of gender and racial biases in the autonomous weapon systems being discussed at the United Nations (UN). The criteria that will inform who is and is not a combatant—and, therefore, a target—will likely involve gender, age, race, and ability. Assumptions about men’s roles may miscategorize civilian men as combatants due to encoded gender biases among human operators as well as within the data-driven process itself.
The concern is that machine algorithms may amplify existing humanitarian violations. Facial recognition systems could make men, regardless of their actual combatant or civilian status, hyper-visible as targets. Biased data sets and inadequately trained algorithms may mean that women of color would be misrecognized at a higher rate, leaving them exposed to differential risks. A woman of color in the proximity of a military fortification may not be recognized as human by an AI system. A group of individuals, including women and children, whose faces are partially obscured, may be determined to be military-aged males instead.
To understand the risks associated with gender bias in data sets and algorithms used in autonomous weapons systems, the United Nations Institute for Disarmament Research (UNIDIR) is conducting research on gender norms in military technologies involving AI. The objective is to uncover the gendered assumptions embedded in the machine learning model and identify measures to respond to and mitigate potential harmful consequences.
Although the project is still ongoing, one of its recommendations will be to establish a gender audit of military AI that addresses potentials for bias through the design, testing, and deployment of such technologies. Datasets and algorithms for military AI should be evaluated to test for different outcomes based on gender, age, race, and ability. Experts in gender and race should be consulted throughout the conceptualization, design, development, use, and disposal of these systems. Particular attention needs to be paid to data associated with wartime relief to be sure that this information is not later weaponized.
In view of the rapid pace of technological developments, this issue has become more urgent. The fact that further research is needed to assess bias in military applications of AI should not be used as a justification to turn a gender-blind eye to the potential consequences of autonomous weapons systems. Previous research from civilian applications shows that, rather than providing a corrective to human prejudice, machine models can actually replicate and even amplify systemic inequalities. Diplomats taking part in UN discussions have a chance to acknowledge this in future negotiations, and find ways to protect civilians and prevent harm.
Dr. Katherine Chandler is an assistant professor in the School of Foreign Service at Georgetown University and a consultant with the United Nations Institute for Disarmament Research.