The history of the development of facial recognition software (FRS) has been driven by massive increases in the size of the databases used for its machine learning and identity verification infrastructures: while the 1996 FERET database boasted the largest such database of its time at 14,126 images of 1199 individuals, the more recent 2018 Diversity in Faces database by IBM contains more than one million faces; even further, as a New York Times article details, Clearview AI claims to have a facial database of over 3 billion images (Hill, 2020). Driven by logic rooted in computer science rationales, such massive databases are deployed under the auspices of generating faster and more accurate FRS systems. Yet, as a wealth of researchers have proven, this focus on speed, efficiency, and accuracy neglects key humanities concerns and perpetuates racial, gendered and class biases (Boulamwini and Gebru, 2018; Garvie, Bedoya, and Frankle, 2016; Browne, 2015).
Our case study, “This Criminal Does Not Exist,” (TCDNE) deliberately disrupts the logics of massive data extraction and processing, aiming to instead use the machine learning technique of a Generative Adversarial Network (GAN) with a minimal dataset to surface the inherent biases within law enforcement FRS systems. To begin, we created a GAN and trained it on 1173 faces drawn from the Multiple Encounters Databases (MEDS), a database made of deceased people’s mugshots (see Appendix A). TCDNE is a data visualization project based on minimal data and minimal computing: our training database base is exponentially smaller than typical FRS systems.This is problematic as a decreased dataset size makes it harder for a neural network to generalize, which exposes even more biases based on the missing data (Linjordet and Balog, 2019). Interweaving our own rationales for its creation, we will detail the construction of TCDNE at the Ryerson University Collabratory and Digital Media Experience Labs. In contrast to computer science departments and/or corporations, the under-powered resources available to us, spanning from hardware, machine learning processing hosting, and physical space, further illustrate how digital humanists are often restricted by a myriad of components.
Yet, solving for humanist problems and scholarship with machine learning can produce effective results, even as a computer science perspective would likely dismiss the end results as a “failure”. In particular, using machine learning techniques, we have been able to surface what is the “most common” type of face within MEDS dataset; that the portraits generated are primarily of African American males speaks to the types of faces over-represented in these virtual spaces. Further, from this data visualization, TCDNE is indicative of contemporary State applications of FRS, bringing to light the clear biases inherent in the dataset, biases further perpetuated through algorithms trained on these types of dataset. TCDNE signals another potential set of tactics and research creation paths for digital humanists in general, but, more specific to our project, to those who are interested in simultaneously educating the public about the nature of problematic facial datasets, while producing arguments about the ethical implications about such databases and their in-built classification practices.
Buolamwini, Joy and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities in
Commercial Gender Classification” Conference on Fairness, Accountability, and
Transparency. Proceedings of Machine Learning Research 81:1–15, 2018.
Browne, Simone. Dark Matters: On the Surveillance of Blackness. Duke University Press, 2015.
“Diversity in Faces Dataset.” IBM Research. https://www.research.ibm.com/
artificial-intelligence/trusted-ai/diversity-in-faces/. Accessed Jan. 26 2020.
“Face Recognition Technology: FERET.” National Institute of Standards and Technology. https://
www.nist.gov/programs-projects/face-recognition-technology-feret. Accessed Jan. 26
Garvie, Clare, Alvaro M. Bedoya and Jonathan Frankle. “The Perpetual Line-Up”
Georgetown Law Centre on Privacy and Technology. 2016.
https//:www.theperpetualineup.org.Accessed Jan. 26 2020.
Hill, Kashmir. “The Secretive Company That Might End Privacy as We Know It” New York
Times. Jan. 18, 2020. https://www.nytimes.com/2020/01/18/technology/clearview-
privacy-facial-recognition.html. Accessed Jan. 26 2020.
Linjordet Trond, and Krisztian Balog. (2019) “Impact of Training Dataset Size on Neural Answer
Selection Models.”Advances in Information Retrieval. vol 11437. pp 828-835.
“Multiple Encounters Databases (MEDS)”. National Institute of Standards and Technology.
meds. Accessed Jan. 26 2020.