RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of
Conversational Language Models
- URL: http://arxiv.org/abs/2106.03521v1
- Date: Mon, 7 Jun 2021 11:22:39 GMT
- Title: RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of
Conversational Language Models
- Authors: Soumya Barikeri, Anne Lauscher, Ivan Vuli\'c, and Goran Glava\v{s}
- Abstract summary: Text representation models are prone to exhibit a range of societal biases.
Recent work has predominantly focused on measuring and mitigating bias in pretrained language models.
We present RedditBias, the first conversational data set grounded in the actual human conversations from Reddit.
- Score: 37.98671828283487
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Text representation models are prone to exhibit a range of societal biases,
reflecting the non-controlled and biased nature of the underlying pretraining
data, which consequently leads to severe ethical issues and even bias
amplification. Recent work has predominantly focused on measuring and
mitigating bias in pretrained language models. Surprisingly, the landscape of
bias measurements and mitigation resources and methods for conversational
language models is still very scarce: it is limited to only a few types of
bias, artificially constructed resources, and completely ignores the impact
that debiasing methods may have on the final performance in dialog tasks, e.g.,
conversational response generation. In this work, we present RedditBias, the
first conversational data set grounded in the actual human conversations from
Reddit, allowing for bias measurement and mitigation across four important bias
dimensions: gender, race, religion, and queerness. Further, we develop an
evaluation framework which simultaneously 1) measures bias on the developed
RedditBias resource, and 2) evaluates model capability in dialog tasks after
model debiasing. We use the evaluation framework to benchmark the widely used
conversational DialoGPT model along with the adaptations of four debiasing
methods. Our results indicate that DialoGPT is biased with respect to religious
groups and that some debiasing techniques can remove this bias while preserving
downstream task performance.
Related papers
- Projective Methods for Mitigating Gender Bias in Pre-trained Language Models [10.418595661963062]
Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters.
We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated.
arXiv Detail & Related papers (2024-03-27T17:49:31Z) - Bias in Opinion Summarisation from Pre-training to Adaptation: A Case
Study in Political Bias [4.964212137957899]
Opinion summarisation aims to summarise the salient information and opinions presented in documents such as product reviews, discussion forums, and social media texts.
generating biased summaries has the risk of potentially swaying public opinion.
arXiv Detail & Related papers (2024-02-01T04:15:59Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Debiasing Stance Detection Models with Counterfactual Reasoning and
Adversarial Bias Learning [15.68462203989933]
Stance detection models tend to rely on dataset bias in the text part as a shortcut.
We propose an adversarial bias learning module to model the bias more accurately.
arXiv Detail & Related papers (2022-12-20T16:20:56Z) - Towards an Enhanced Understanding of Bias in Pre-trained Neural Language
Models: A Survey with Special Emphasis on Affective Bias [2.6304695993930594]
We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur, and various ways in which these biases could be quantified and mitigated.
Considering wide applicability of textual affective computing based downstream tasks in real-world systems such as business, healthcare, education, etc., we give a special emphasis on investigating bias in the context of affect (emotion) i.e., Affective Bias.
We present a summary of various bias evaluation corpora that help to aid future research and discuss challenges in the research on bias in pre-trained language models.
arXiv Detail & Related papers (2022-04-21T18:51:19Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - An Empirical Survey of the Effectiveness of Debiasing Techniques for
Pre-Trained Language Models [4.937002982255573]
Recent work has shown that pre-trained language models capture social biases from the text corpora they are trained on.
Five recently proposed debiasing techniques: Counterfactual Data Augmentation, Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias.
We quantify the effectiveness of each technique using three different bias benchmarks while also measuring the impact of these techniques on a model's language modeling ability.
arXiv Detail & Related papers (2021-10-16T09:40:30Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.