Related papers: Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting

Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting

URL: http://arxiv.org/abs/2004.14088v3
Date: Thu, 20 Aug 2020 14:22:11 GMT
Title: Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting
Authors: Guanhua Zhang, Bing Bai, Junqi Zhang, Kun Bai, Conghui Zhu and Tiejun Zhao
Abstract summary: We formalize the unintended biases in text classification datasets as a kind of selection bias from the non-discrimination distribution to the discrimination distribution. Our method can effectively alleviate the impacts of the unintended biases without significantly hurting models' generalization ability.
Score: 36.87473475196733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the recent proliferation of the use of text classifications, researchers have found that there are certain unintended biases in text classification datasets. For example, texts containing some demographic identity-terms (e.g., "gay", "black") are more likely to be abusive in existing abusive language detection datasets. As a result, models trained with these datasets may consider sentences like "She makes me happy to be gay" as abusive simply because of the word "gay." In this paper, we formalize the unintended biases in text classification datasets as a kind of selection bias from the non-discrimination distribution to the discrimination distribution. Based on this formalization, we further propose a model-agnostic debiasing training framework by recovering the non-discrimination distribution using instance weighting, which does not require any extra resources or annotations apart from a pre-defined set of demographic identity-terms. Experiments demonstrate that our method can effectively alleviate the impacts of the unintended biases without significantly hurting models' generalization ability.

Related papers

FairUDT: Fairness-aware Uplift Decision Trees [2.605892372263285]
We propose FairUDT, a fairness-aware Uplift-based Decision Tree for discrimination identification. We show how the integration of uplift modeling with decision trees can be adapted to include fair splitting criteria. We also show that FairUDT is inherently interpretable and can be utilized in discrimination detection tasks.
arXiv Detail & Related papers (2025-02-03T09:24:01Z)
Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization [13.773597081543185]
We introduce a novel debiasing regularization technique based on the class-wise variance of embeddings. Our method does not require attribute labels and targets any attribute, thus addressing the shortcomings of existing debiasing methods.
arXiv Detail & Related papers (2024-09-29T03:56:50Z)
Language-guided Detection and Mitigation of Unknown Dataset Bias [23.299264313976213]
We propose a framework to identify potential biases as keywords without prior knowledge based on the partial occurrence in the captions. Our framework not only outperforms existing methods without prior knowledge, but also is even comparable with a method that assumes prior knowledge.
arXiv Detail & Related papers (2024-06-05T03:11:33Z)
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models. We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z)
Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness [15.210232622716129]
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes. Data augmentation reduces gender bias by adding counterfactual examples to the training dataset. We show that some of the examples in the augmented dataset can be not important or even harmful for fairness.
arXiv Detail & Related papers (2022-11-20T22:42:30Z)
Detecting Unintended Social Bias in Toxic Language Datasets [32.724030288421474]
This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxicity Classification" The dataset contains instances annotated for five different bias categories, viz., gender, race/ethnicity, religion, political, and LGBTQ. We train transformer-based models using our curated datasets and report baseline performance for bias identification, target generation, and bias implications.
arXiv Detail & Related papers (2022-10-21T06:50:12Z)
Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture. We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Mitigating Biases in Toxic Language Detection through Invariant Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns. Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z)
The Authors Matter: Understanding and Mitigating Implicit Bias in Deep Text Classification [36.361778457307636]
Deep text classification models can produce biased outcomes for texts written by authors of certain demographic groups. In this paper, we first demonstrate that implicit bias exists in different text classification tasks for different demographic groups. We then build a learning-based interpretation method to deepen our knowledge of implicit bias.
arXiv Detail & Related papers (2021-05-06T16:17:38Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.