The existing evidence base about gender diversity in the AI workforce is, however, not without its limitations: It is mostly based on small samples that although highly relevant (technology industry workforce, papers presented in prestigious conferences) are not necessarily representative of the wider AI research workforce. They also tend to ignore the extent to which the situation of AI is the same, better or worse than in other STEM disciplines, and do not consider variation in the situation between countries that might help to identify practices and policies that could improve the situation. They also tend to assume that increasing gender diversity will directly change the nature of the AI research that is produced in ways that increase the inclusiveness of its benefits and reduces its risks, yet this assumption remains untested. In some cases, it is reliant on commercial data with analyses that are hard to reproduce. As the AI Index report notes, ‘a significant barrier to improving diversity is the lack of access to data on diversity statistics in industry and in academia’.
Here, we use a larger dataset from arXiv, an online preprints repository widely adopted by AI researchers, enriched with geographical, discipline and gender information, to address some of the gender diversity questions, thus improving the evidence base about gender diversity in AI research.