What Came Before

Related Research & Projects, an Environmental Scan

Early Algorithm Audits

An overview of critical early algorithm audits should begin with the work of Joy Buolamwini. In her groundbreaking doctoral work, Buolamwini advocates for auditing digital products to probe, track, analyze, and publicize the biases inherent in algorithmic systems. Coining the term "the coded gaze", Buolamwini's research began through her own lived experience of being misidentified by the very facial recognition systems she sought to study (Buolamwini).

Safiya Noble's Algorithms of Oppression observes how autocomplete options of Google searches, while offering positive correlation options to words like "white," further perpetuate negative biases associated with the term "black women" and other racial categories (Noble).

In another illustrative example of this kind of work, Metaxa et al. have demonstrated through algorithmic auditing that image searches for common occupations through engines like Google yield results that favor white, male representations over other gender and racial representations, both propagating biases and inaccurately aligning with available demographic data for those occupations (Metaxa et al., "An Image of Society").

Auditing GenAI Outputs

More recent projects work directly with auditing GenAI outputs. Although most of these projects are not explicitly identified as algorithm audits, according to Metaxa et al.'s definition, most of them certainly qualify as such:

Journalists for Bloomberg conducted a rigorous experiment to audit the image generator Stable Diffusion for systemic biases, including "more than 5000 images." This project asked Stable Diffusion's image generator to display/create images of people's faces by profession and other salient labels by asking for "A color photograph of [profession/label]." The authors then indexed the gender and skin color of the created portraits/pixel collages. The final product is a piece of journalism that incorporates a variety of data visualizations, narrates the experiment, analyzes the findings, and uncovers enduring race, gender, and class biases in the system's output. For example, Nicoletti and Bass discover that representations of "Architects" are largely legible as white men, while the faces representing "Housekeepers" are legible as darker-skinned women (Nicoletti & Bass).

On a smaller scale, but driven by a similar curiosity, in a GenAI discourse community on Reddit, the thread "Why Does AI see Beauty This Way?" makes observations based on an experimental dataset of 80 images. In this smaller audit, the experimenter collects and compares image outputs showing women's faces generated by Stable Diffusion in response to adjectives ranging from "gorgeous" to "ugly" which the experimenter then attaches to distinct racial categories. Consistent defaults emerge across racial categories. Old is synonymous with "ugly." "Ugly" is also correlated with not smiling or having combed hair. Looking at a matrix of image outputs in aggregate also demonstrates a lack of variety: imagine a sliding scale from Stereotypical Barbie to Weird Barbie.

Queer Representation in GenAI

Experiments on queer representation by image generators have also been conducted. Again in the journalism context, Reece Rogers' (of Wired Magazine) article "Here's How Generative AI depicts Queer People" offers an overview of the AI's problematic depiction of queer people, especially non-binary and trans people. In an academic forum, "Stereotypes and Smut: (Mis)representation of Non-cisgender Identities by Text-to-Image Models" qualifies as a related project. The authors, Ungless et al., "conduct a thorough analysis in which [they] compare the outputs of three image generation models for prompts containing cis-gender vs. non-cisgender identity terms." The authors offer the queried images, narrate the audit, and include data visualization to demonstrate that "certain non-cisgender identities are consistently (mis)represented as less human, more stereotyped and more sexualized" (Ungless et al.).

Independent from the above, two of the same authors also conducted an experiment on "Potential Pitfalls With Automatic Sentiment Analysis: The example of Queerphobic Bias" which can serve as an example for text-centered auditing (Ungless et al.). Related, a smaller experiment by Sabine Weber shared as a post via QueerInAI's website (an invaluable resource) uses sentiment analysis to ascertain ChatGPT's response to "queer" and traces the model's reaction to requests to define "dyke" which leads to a notice about content violation.

Pedagogical Approaches

GenAI image experiments purposely connected to text experiments show up in pedagogical collaborative practices of teachers of writing. One notable example comes from Maha Bali, early writing studies instigator of the critical literacy agenda, who includes GenAI images in her approach with students, documented in a short blog post "Where Are the Crescents in AI?" When Bali and her students probed GenAI's outputs in response to inputs about Palestine, text outputs demonstrated biases against Palestine. Those biases were more readily detectable in image outputs. Image audits can cultivate a particular kind of literacy that allows writers to reckon with these systems' defaults so that they "will be better able to resist using AI in ways that are harmful or inappropriate" (Bali).

Where This Project Fits

So, while larger-scoped audits of text and image analysis mostly exist separately and investigate AI-generated representations in contexts of and comparison to lived experience, the comprehensive analysis of AI-generated images plus connected texts based on existing cultural artifacts presents an expansion of existing scholarly and journalistic projects.

Three Degrees of Representation (3°R) Auditing GenAI through Lesbian Film Data