Mitigating Political Bias in Language Models Through Reinforced
Calibration
- URL: http://arxiv.org/abs/2104.14795v1
- Date: Fri, 30 Apr 2021 07:21:30 GMT
- Title: Mitigating Political Bias in Language Models Through Reinforced
Calibration
- Authors: Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Lili Wang, Soroush
Vosoughi
- Abstract summary: We describe metrics for measuring political bias in GPT-2 generation.
We propose a reinforcement learning (RL) framework for mitigating political biases in generated text.
- Score: 6.964628305312507
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current large-scale language models can be politically biased as a result of
the data they are trained on, potentially causing serious problems when they
are deployed in real-world settings. In this paper, we describe metrics for
measuring political bias in GPT-2 generation and propose a reinforcement
learning (RL) framework for mitigating political biases in generated text. By
using rewards from word embeddings or a classifier, our RL framework guides
debiased generation without having access to the training data or requiring the
model to be retrained. In empirical experiments on three attributes sensitive
to political bias (gender, location, and topic), our methods reduced bias
according to both our metrics and human evaluation, while maintaining
readability and semantic coherence.
Related papers
- Balancing Transparency and Accuracy: A Comparative Analysis of Rule-Based and Deep Learning Models in Political Bias Classification [5.550237524713089]
The study highlights the sensitivity of modern self-learning systems to unconstrained data ingestion.
Applying both models to left-leaning (CNN) and right-leaning (FOX) news articles, we assess their effectiveness on data beyond the original training and test sets.
We contrast the opaque architecture of a deep learning model with the transparency of a linguistically informed rule-based model.
arXiv Detail & Related papers (2024-11-07T00:09:18Z) - REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning [18.064064773660174]
We introduce REFINE-LM, a debiasing method that uses reinforcement learning to handle different types of biases without any fine-tuning.
By training a simple model on top of the word probability distribution of a LM, our bias reinforcement learning method enables model debiasing without human annotations.
Experiments conducted on a wide range of models, including several LMs, show that our method significantly reduces stereotypical biases while preserving LMs performance.
arXiv Detail & Related papers (2024-08-18T14:08:31Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of
Conversational Language Models [37.98671828283487]
Text representation models are prone to exhibit a range of societal biases.
Recent work has predominantly focused on measuring and mitigating bias in pretrained language models.
We present RedditBias, the first conversational data set grounded in the actual human conversations from Reddit.
arXiv Detail & Related papers (2021-06-07T11:22:39Z) - Impact of Gender Debiased Word Embeddings in Language Modeling [0.0]
Gender, race and social biases have been detected as evident examples of unfairness in applications of Natural Language Processing.
Recent studies have shown that the human-generated data used in training is an apparent factor of getting biases.
Current algorithms have also been proven to amplify biases from data.
arXiv Detail & Related papers (2021-05-03T14:45:10Z) - Inflating Topic Relevance with Ideology: A Case Study of Political
Ideology Bias in Social Topic Detection Models [16.279854003220418]
We investigate the impact of political ideology biases in training data.
Our work highlights the susceptibility of large, complex models to propagating the biases from human-selected input.
As a way to mitigate the bias, we propose to learn a text representation that is invariant to political ideology while still judging topic relevance.
arXiv Detail & Related papers (2020-11-29T05:54:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.