Related papers: Towards Equal Gender Representation in the Annotations of Toxic Language Detection

Towards Equal Gender Representation in the Annotations of Toxic Language Detection

URL: http://arxiv.org/abs/2106.02183v1
Date: Fri, 4 Jun 2021 00:12:38 GMT
Title: Towards Equal Gender Representation in the Annotations of Toxic Language Detection
Authors: Elizabeth Excell and Noura Al Moubayed
Abstract summary: We study the differences in the ways men and women annotate comments for toxicity. We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men. We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data.
Score: 6.129776019898014
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper, we focus on the differences in the ways men and women annotate comments for toxicity, investigating how these differences result in models that amplify the opinions of male annotators. We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men. We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data. We then apply the learned associations between gender and language to toxic language classifiers, finding that models trained exclusively on female-annotated data perform 1.8% better than those trained solely on male-annotated data and that training models on data after removing all offensive words reduces bias in the model by 55.5% while increasing the sensitivity by 0.4%.

Related papers

Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias [8.168722337906148]
This study investigates the presence of bias in harmful speech classification of gender-queer dialect online. We introduce a novel dataset, QueerLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts.
arXiv Detail & Related papers (2024-05-23T18:07:28Z)
Are Models Biased on Text without Gender-related Language? [14.931375031931386]
We introduce UnStereoEval (USE), a novel framework for investigating gender bias in stereotype-free scenarios. USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations. We find low fairness across all 28 tested models, suggesting that bias does not solely stem from the presence of gender-related words.
arXiv Detail & Related papers (2024-05-01T15:51:15Z)
DiFair: A Benchmark for Disentangled Assessment of Gender Knowledge and Bias [13.928591341824248]
Debiasing techniques have been proposed to mitigate the gender bias that is prevalent in pretrained language models. These are often evaluated on datasets that check the extent to which the model is gender-neutral in its predictions. This evaluation protocol overlooks the possible adverse impact of bias mitigation on useful gender knowledge.
arXiv Detail & Related papers (2023-10-22T15:27:16Z)
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases. This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z)
Identifying and examining machine learning biases on Adult dataset [0.7856362837294112]
This research delves into the reduction of machine learning model bias through Ensemble Learning. Our rigorous methodology comprehensively assesses bias across various categorical variables, ultimately revealing a pronounced gender attribute bias. This study underscores ethical considerations and advocates the implementation of hybrid models for a data-driven society marked by inclusivity and impartiality.
arXiv Detail & Related papers (2023-10-13T19:41:47Z)
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets. Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z)
Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks. We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations. We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z)
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal [74.52580517012832]
Language excel models can be biased in ways including male and female knowledge with genderneutral genders. We present a novel approach to mitigate gender disparity based on multiple learning role settings. We observe that models that reduce gender polarity language do not improve fairness or downstream classification.
arXiv Detail & Related papers (2022-03-23T17:34:35Z)
Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin. We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.