Related papers: QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities

QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities

URL: http://arxiv.org/abs/2406.12399v1
Date: Tue, 18 Jun 2024 08:40:29 GMT
Title: QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities
Authors: Mae Sosto, Alberto Barrón-Cedeño,
Abstract summary: We assess the potential harm caused by sentence completions generated by English large language models concerning LGBTQIA+ individuals. The analysis indicates that large language models tend to exhibit discriminatory behaviour more frequently towards individuals within the LGBTQIA+ community.
Score: 4.82206141686275
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the increasing role of Natural Language Processing (NLP) in various applications, challenges concerning bias and stereotype perpetuation are accentuated, which often leads to hate speech and harm. Despite existing studies on sexism and misogyny, issues like homophobia and transphobia remain underexplored and often adopt binary perspectives, putting the safety of LGBTQIA+ individuals at high risk in online spaces. In this paper, we assess the potential harm caused by sentence completions generated by English large language models (LLMs) concerning LGBTQIA+ individuals. This is achieved using QueerBench, our new assessment framework, which employs a template-based approach and a Masked Language Modeling (MLM) task. The analysis indicates that large language models tend to exhibit discriminatory behaviour more frequently towards individuals within the LGBTQIA+ community, reaching a difference gap of 7.2% in the QueerBench score of harmfulness.

Related papers

QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks [0.38887448816036313]
We investigate whether explicit information about a subject's gender or sexuality influences responses across three subject categories.<n>Our findings show that Masked Language Models (MLMs) produce the least favorable sentiment, higher toxicity, and more negative regard for queer-marked subjects.
arXiv Detail & Related papers (2026-01-28T16:06:04Z)
EuroGEST: Investigating gender stereotypes in multilingual language models [53.88459905621724]
Large language models increasingly support multiple languages, yet most benchmarks for gender bias remain English-centric.<n>We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.
arXiv Detail & Related papers (2025-06-04T11:58:18Z)
LLMs Reproduce Stereotypes of Sexual and Gender Minorities [7.068680287596106]
We study the biases of large language models towards sexual and gender minorities beyond binary categories. Our analysis shows that LLMs generate stereotyped representations of sexual and gender minorities in creative writing.
arXiv Detail & Related papers (2025-01-10T12:46:39Z)
The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias. Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning. We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z)
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms. Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Refusal as Silence: Gendered Disparities in Vision-Language Model Responses [0.4199844472131921]
This study investigates refusal as a sociotechnical outcome through a counterfactual persona design.<n>We find that transgender and non-binary personas experience significantly higher refusal rates, even in non-harmful contexts.
arXiv Detail & Related papers (2024-06-12T13:52:30Z)
Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias [8.168722337906148]
This study investigates the presence of bias in harmful speech classification of gender-queer dialect online. We introduce a novel dataset, QueerLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts.
arXiv Detail & Related papers (2024-05-23T18:07:28Z)
Laissez-Faire Harms: Algorithmic Biases in Generative Language Models [0.0]
We show that synthetically generated texts from five of the most pervasive LMs perpetuate harms of omission, subordination, and stereotyping for minoritized individuals. We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models.
arXiv Detail & Related papers (2024-04-11T05:09:03Z)
Stereotypes and Smut: The (Mis)representation of Non-cisgender Identities by Text-to-Image Models [6.92043136971035]
We investigate how multimodal models handle diverse gender identities. We find certain non-cisgender identities are consistently (mis)represented as less human, more stereotyped and more sexualised. These improvements could pave the way for a future where change is led by the affected community.
arXiv Detail & Related papers (2023-05-26T16:28:49Z)
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation [69.25368160338043]
Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. We assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation. We introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community.
arXiv Detail & Related papers (2023-05-17T04:21:45Z)
Detection of Homophobia & Transphobia in Dravidian Languages: Exploring Deep Learning Methods [1.5687561161428403]
Homophobia and transphobia constitute offensive comments against LGBT+ community. The paper attempts to explore applicability of different deep learning mod-els for classification of the social media comments in Malayalam and Tamil lan-guages.
arXiv Detail & Related papers (2023-04-03T12:15:27Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
A Framework for the Computational Linguistic Analysis of Dehumanization [52.735780962665814]
We analyze discussions of LGBTQ people in the New York Times from 1986 to 2015. We find increasingly humanizing descriptions of LGBTQ people over time. The ability to analyze dehumanizing language at a large scale has implications for automatically detecting and understanding media bias as well as abusive language online.
arXiv Detail & Related papers (2020-03-06T03:02:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.