The Authors Matter: Understanding and Mitigating Implicit Bias in Deep
Text Classification
- URL: http://arxiv.org/abs/2105.02778v1
- Date: Thu, 6 May 2021 16:17:38 GMT
- Title: The Authors Matter: Understanding and Mitigating Implicit Bias in Deep
Text Classification
- Authors: Haochen Liu, Wei Jin, Hamid Karimi, Zitao Liu and Jiliang Tang
- Abstract summary: Deep text classification models can produce biased outcomes for texts written by authors of certain demographic groups.
In this paper, we first demonstrate that implicit bias exists in different text classification tasks for different demographic groups.
We then build a learning-based interpretation method to deepen our knowledge of implicit bias.
- Score: 36.361778457307636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is evident that deep text classification models trained on human data
could be biased. In particular, they produce biased outcomes for texts that
explicitly include identity terms of certain demographic groups. We refer to
this type of bias as explicit bias, which has been extensively studied.
However, deep text classification models can also produce biased outcomes for
texts written by authors of certain demographic groups. We refer to such bias
as implicit bias of which we still have a rather limited understanding. In this
paper, we first demonstrate that implicit bias exists in different text
classification tasks for different demographic groups. Then, we build a
learning-based interpretation method to deepen our knowledge of implicit bias.
Specifically, we verify that classifiers learn to make predictions based on
language features that are related to the demographic attributes of the
authors. Next, we propose a framework Debiased-TC to train deep text
classifiers to make predictions on the right features and consequently mitigate
implicit bias. We conduct extensive experiments on three real-world datasets.
The results show that the text classification models trained under our proposed
framework outperform traditional models significantly in terms of fairness, and
also slightly in terms of classification performance.
Related papers
- Less can be more: representational vs. stereotypical gender bias in facial expression recognition [3.9698529891342207]
Machine learning models can inherit biases from their training data, leading to discriminatory or inaccurate predictions.
This paper investigates the propagation of demographic biases from datasets into machine learning models.
We focus on the gender demographic component, analyzing two types of bias: representational and stereotypical.
arXiv Detail & Related papers (2024-06-25T09:26:49Z) - Language-guided Detection and Mitigation of Unknown Dataset Bias [23.299264313976213]
We propose a framework to identify potential biases as keywords without prior knowledge based on the partial occurrence in the captions.
Our framework not only outperforms existing methods without prior knowledge, but also is even comparable with a method that assumes prior knowledge.
arXiv Detail & Related papers (2024-06-05T03:11:33Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - COFFEE: Counterfactual Fairness for Personalized Text Generation in
Explainable Recommendation [56.520470678876656]
bias inherent in user written text can associate different levels of linguistic quality with users' protected attributes.
We introduce a general framework to achieve measure-specific counterfactual fairness in explanation generation.
arXiv Detail & Related papers (2022-10-14T02:29:10Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z) - Demographics Should Not Be the Reason of Toxicity: Mitigating
Discrimination in Text Classifications with Instance Weighting [36.87473475196733]
We formalize the unintended biases in text classification datasets as a kind of selection bias from the non-discrimination distribution to the discrimination distribution.
Our method can effectively alleviate the impacts of the unintended biases without significantly hurting models' generalization ability.
arXiv Detail & Related papers (2020-04-29T11:22:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.