Towards Equal Gender Representation in the Annotations of Toxic Language
Detection
- URL: http://arxiv.org/abs/2106.02183v1
- Date: Fri, 4 Jun 2021 00:12:38 GMT
- Title: Towards Equal Gender Representation in the Annotations of Toxic Language
Detection
- Authors: Elizabeth Excell and Noura Al Moubayed
- Abstract summary: We study the differences in the ways men and women annotate comments for toxicity.
We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men.
We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data.
- Score: 6.129776019898014
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Classifiers tend to propagate biases present in the data on which they are
trained. Hence, it is important to understand how the demographic identities of
the annotators of comments affect the fairness of the resulting model. In this
paper, we focus on the differences in the ways men and women annotate comments
for toxicity, investigating how these differences result in models that amplify
the opinions of male annotators. We find that the BERT model as-sociates toxic
comments containing offensive words with male annotators, causing the model to
predict 67.7% of toxic comments as having been annotated by men. We show that
this disparity between gender predictions can be mitigated by removing
offensive words and highly toxic comments from the training data. We then apply
the learned associations between gender and language to toxic language
classifiers, finding that models trained exclusively on female-annotated data
perform 1.8% better than those trained solely on male-annotated data and that
training models on data after removing all offensive words reduces bias in the
model by 55.5% while increasing the sensitivity by 0.4%.
Related papers
- Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias [8.168722337906148]
This study investigates the presence of bias in harmful speech classification of gender-queer dialect online.
We introduce a novel dataset, QueerLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs.
We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts.
arXiv Detail & Related papers (2024-05-23T18:07:28Z) - Are Models Biased on Text without Gender-related Language? [14.931375031931386]
We introduce UnStereoEval (USE), a novel framework for investigating gender bias in stereotype-free scenarios.
USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations.
We find low fairness across all 28 tested models, suggesting that bias does not solely stem from the presence of gender-related words.
arXiv Detail & Related papers (2024-05-01T15:51:15Z) - DiFair: A Benchmark for Disentangled Assessment of Gender Knowledge and
Bias [13.928591341824248]
Debiasing techniques have been proposed to mitigate the gender bias that is prevalent in pretrained language models.
These are often evaluated on datasets that check the extent to which the model is gender-neutral in its predictions.
This evaluation protocol overlooks the possible adverse impact of bias mitigation on useful gender knowledge.
arXiv Detail & Related papers (2023-10-22T15:27:16Z) - Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases.
This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Mitigating Gender Bias in Distilled Language Models via Counterfactual
Role Reversal [74.52580517012832]
Language excel models can be biased in ways including male and female knowledge with genderneutral genders.
We present a novel approach to mitigate gender disparity based on multiple learning role settings.
We observe that models that reduce gender polarity language do not improve fairness or downstream classification.
arXiv Detail & Related papers (2022-03-23T17:34:35Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.