Mitigating Racial Biases in Toxic Language Detection with an
Equity-Based Ensemble Framework
- URL: http://arxiv.org/abs/2109.13137v1
- Date: Mon, 27 Sep 2021 15:54:05 GMT
- Title: Mitigating Racial Biases in Toxic Language Detection with an
Equity-Based Ensemble Framework
- Authors: Matan Halevy, Camille Harris, Amy Bruckman, Diyi Yang, Ayanna Howard
- Abstract summary: Recent research has demonstrated how racial biases against users who write African American English exist in popular toxic language datasets.
We propose additional descriptive fairness metrics to better understand the source of these biases.
We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets.
- Score: 9.84413545378636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research has demonstrated how racial biases against users who write
African American English exists in popular toxic language datasets. While
previous work has focused on a single fairness criteria, we propose to use
additional descriptive fairness metrics to better understand the source of
these biases. We demonstrate that different benchmark classifiers, as well as
two in-process bias-remediation techniques, propagate racial biases even in a
larger corpus. We then propose a novel ensemble-framework that uses a
specialized classifier that is fine-tuned to the African American English
dialect. We show that our proposed framework substantially reduces the racial
biases that the model learns from these datasets. We demonstrate how the
ensemble framework improves fairness metrics across all sample datasets with
minimal impact on the classification performance, and provide empirical
evidence in its ability to unlearn the annotation biases towards authors who
use African American English.
** Please note that this work may contain examples of offensive words and
phrases.
Related papers
- Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Towards Better Inclusivity: A Diverse Tweet Corpus of English Varieties [0.0]
We aim to address the issue of bias at its root - the data itself.
We curate a dataset of tweets from countries with high proportions of underserved English variety speakers.
Following best annotation practices, our growing corpus features 170,800 tweets taken from 7 countries.
arXiv Detail & Related papers (2024-01-21T13:18:20Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in
Pretrained Language Models [2.567384209291337]
An increasing awareness of biased patterns in natural language processing resources has motivated many metrics to quantify bias' and fairness'
We survey the existing literature on fairness metrics for pretrained language models and experimentally evaluate compatibility.
We find that many metrics are not compatible and highly depend on (i) templates, (ii) attribute and target seeds and (iii) the choice of embeddings.
arXiv Detail & Related papers (2021-12-14T15:04:56Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - The Authors Matter: Understanding and Mitigating Implicit Bias in Deep
Text Classification [36.361778457307636]
Deep text classification models can produce biased outcomes for texts written by authors of certain demographic groups.
In this paper, we first demonstrate that implicit bias exists in different text classification tasks for different demographic groups.
We then build a learning-based interpretation method to deepen our knowledge of implicit bias.
arXiv Detail & Related papers (2021-05-06T16:17:38Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - Hate Speech Detection and Racial Bias Mitigation in Social Media based
on BERT model [1.9336815376402716]
We introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT.
We evaluate the proposed model on two publicly available datasets annotated for racism, sexism, hate or offensive content on Twitter.
arXiv Detail & Related papers (2020-08-14T16:47:25Z) - Examining Racial Bias in an Online Abuse Corpus with Structural Topic
Modeling [0.30458514384586405]
We use structural topic modeling to examine racial bias in social media posts.
We augment the abusive language dataset by adding an additional feature indicating the predicted probability of the tweet being written in African-American English.
arXiv Detail & Related papers (2020-05-26T21:02:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.