Laissez-Faire Harms: Algorithmic Biases in Generative Language Models
- URL: http://arxiv.org/abs/2404.07475v2
- Date: Tue, 16 Apr 2024 04:07:42 GMT
- Title: Laissez-Faire Harms: Algorithmic Biases in Generative Language Models
- Authors: Evan Shieh, Faye-Marie Vassel, Cassidy Sugimoto, Thema Monroe-White,
- Abstract summary: We show that synthetically generated texts from five of the most pervasive LMs perpetuate harms of omission, subordination, and stereotyping for minoritized individuals.
We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs.
Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified) are lacking and have not yet been grounded in end-consumer harms. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this "laissez-faire" setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (e.g. perpetual foreigner) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.
Related papers
- The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities [4.82206141686275]
We assess the potential harm caused by sentence completions generated by English large language models concerning LGBTQIA+ individuals.
The analysis indicates that large language models tend to exhibit discriminatory behaviour more frequently towards individuals within the LGBTQIA+ community.
arXiv Detail & Related papers (2024-06-18T08:40:29Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Protected group bias and stereotypes in Large Language Models [2.1122940074160357]
This paper investigates the behavior of Large Language Models (LLMs) in the domains of ethics and fairness.
We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias.
arXiv Detail & Related papers (2024-03-21T00:21:38Z) - Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria [4.738231680800414]
This paper studies how to uncover and quantify the presence of gender biases in generative language models.
We derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency.
Our results address the presence of occupational gender bias within such conversational language models.
arXiv Detail & Related papers (2024-03-13T14:19:08Z) - Marked Personas: Using Natural Language Prompts to Measure Stereotypes
in Language Models [33.157279170602784]
We present Marked Personas, a prompt-based method to measure stereotypes in large language models (LLMs)
We find that portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts.
An intersectional lens reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women.
arXiv Detail & Related papers (2023-05-29T16:29:22Z) - "I'm fully who I am": Towards Centering Transgender and Non-Binary
Voices to Measure Biases in Open Language Generation [69.25368160338043]
Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life.
We assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation.
We introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community.
arXiv Detail & Related papers (2023-05-17T04:21:45Z) - Easily Accessible Text-to-Image Generation Amplifies Demographic
Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes.
We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.