Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias
- URL: http://arxiv.org/abs/2406.00020v2
- Date: Fri, 21 Jun 2024 18:01:06 GMT
- Title: Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias
- Authors: Rebecca Dorn, Lee Kezar, Fred Morstatter, Kristina Lerman,
- Abstract summary: This study investigates the presence of bias in harmful speech classification of gender-queer dialect online.
We introduce a novel dataset, QueerLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs.
We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts.
- Score: 8.168722337906148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Content moderation on social media platforms shapes the dynamics of online discourse, influencing whose voices are amplified and whose are suppressed. Recent studies have raised concerns about the fairness of content moderation practices, particularly for aggressively flagging posts from transgender and non-binary individuals as toxic. In this study, we investigate the presence of bias in harmful speech classification of gender-queer dialect online, focusing specifically on the treatment of reclaimed slurs. We introduce a novel dataset, QueerReclaimLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. Dataset instances are scored by gender-queer annotators for potential harm depending on additional context about speaker identity. We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts and explore the effectiveness of chain-of-thought prompting to teach large language models (LLMs) to leverage author identity context. We reveal a tendency for these models to inaccurately flag texts authored by gender-queer individuals as harmful. Strikingly, across all LLMs the performance is poorest for texts that show signs of being written by individuals targeted by the featured slur (F1 <= 0.24). We highlight an urgent need for fairness and inclusivity in content moderation systems. By uncovering these biases, this work aims to inform the development of more equitable content moderation practices and contribute to the creation of inclusive online spaces for all users.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces [0.6543929004971272]
Team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTM networks.
CNN captures localized features indicative of abusive language through its convolution filters applied on embedded input text.
BiLSTM analyzes this sequence for dependencies among words and phrases.
validation scores showed strong performance across f1-measures, especially for English 0.84.
arXiv Detail & Related papers (2024-04-02T14:55:47Z) - "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in
LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content.
This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z) - "I'm fully who I am": Towards Centering Transgender and Non-Binary
Voices to Measure Biases in Open Language Generation [69.25368160338043]
Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life.
We assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation.
We introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community.
arXiv Detail & Related papers (2023-05-17T04:21:45Z) - Gender Bias in Text: Labeled Datasets and Lexicons [0.30458514384586394]
There is a lack of gender bias datasets and lexicons for automating the detection of gender bias.
We provide labeled datasets and exhaustive lexicons by collecting, annotating, and augmenting relevant sentences.
The released datasets and lexicons span multiple bias subtypes including: Generic He, Generic She, Explicit Marking of Sex, and Gendered Neologisms.
arXiv Detail & Related papers (2022-01-21T12:44:51Z) - Text Style Transfer for Bias Mitigation using Masked Language Modeling [9.350763916068026]
We present a text style transfer model that can be used to automatically debias textual data.
Our model solves such issues by combining latent content encoding with explicit keyword replacement.
arXiv Detail & Related papers (2022-01-21T11:06:33Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Towards Equal Gender Representation in the Annotations of Toxic Language
Detection [6.129776019898014]
We study the differences in the ways men and women annotate comments for toxicity.
We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men.
We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data.
arXiv Detail & Related papers (2021-06-04T00:12:38Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.