From Pretraining Data to Language Models to Downstream Tasks: Tracking
the Trails of Political Biases Leading to Unfair NLP Models
- URL: http://arxiv.org/abs/2305.08283v3
- Date: Thu, 6 Jul 2023 00:40:53 GMT
- Title: From Pretraining Data to Language Models to Downstream Tasks: Tracking
the Trails of Political Biases Leading to Unfair NLP Models
- Authors: Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov
- Abstract summary: Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias.
We develop new methods to measure political biases in LMs trained on such corpora, along social and economic axes, and on top of politically biased LMs.
We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks.
- Score: 40.89699305385269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models (LMs) are pretrained on diverse data sources, including news,
discussion forums, books, and online encyclopedias. A significant portion of
this data includes opinions and perspectives which, on one hand, celebrate
democracy and diversity of ideas, and on the other hand are inherently socially
biased. Our work develops new methods to (1) measure political biases in LMs
trained on such corpora, along social and economic axes, and (2) measure the
fairness of downstream NLP models trained on top of politically biased LMs. We
focus on hate speech and misinformation detection, aiming to empirically
quantify the effects of political (social, economic) biases in pretraining data
on the fairness of high-stakes social-oriented tasks. Our findings reveal that
pretrained LMs do have political leanings that reinforce the polarization
present in pretraining corpora, propagating social biases into hate speech
predictions and misinformation detectors. We discuss the implications of our
findings for NLP research and propose future directions to mitigate unfairness.
Related papers
- Uncovering Political Bias in Emotion Inference Models: Implications for sentiment analysis in social science research [0.0]
This paper investigates the presence of political bias in machine learning models used for sentiment analysis (SA) in social science research.
We conducted a bias audit on a Polish sentiment analysis model developed in our lab.
Our findings indicate that annotations by human raters propagate political biases into the model's predictions.
arXiv Detail & Related papers (2024-07-18T20:31:07Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - Inducing Political Bias Allows Language Models Anticipate Partisan
Reactions to Controversies [5.958974943807783]
This study addresses the challenge of understanding political bias in digitized discourse using Large Language Models (LLMs)
We present a comprehensive analytical framework, consisting of Partisan Bias Divergence Assessment and Partisan Class Tendency Prediction.
Our findings reveal the model's effectiveness in capturing emotional and moral nuances, albeit with some challenges in stance detection.
arXiv Detail & Related papers (2023-11-16T08:57:53Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Learning Unbiased News Article Representations: A Knowledge-Infused
Approach [0.0]
We propose a knowledge-infused deep learning model that learns unbiased representations of news articles using global and local contexts.
We show that the proposed model mitigates algorithmic political bias and outperforms baseline methods to predict the political leaning of news articles with up to 73% accuracy.
arXiv Detail & Related papers (2023-09-12T06:20:34Z) - Quantitative Analysis of Forecasting Models:In the Aspect of Online
Political Bias [0.0]
We propose a approach to classify social media posts into five distinct political leaning categories.
Our approach involves utilizing existing time series forecasting models on two social media datasets with different political ideologies.
arXiv Detail & Related papers (2023-09-11T16:17:24Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - A Survey of Methods for Addressing Class Imbalance in Deep-Learning
Based Natural Language Processing [68.37496795076203]
We provide guidance for NLP researchers and practitioners dealing with imbalanced data.
We first discuss various types of controlled and real-world class imbalance.
We organize the methods by whether they are based on sampling, data augmentation, choice of loss function, staged learning, or model design.
arXiv Detail & Related papers (2022-10-10T13:26:40Z) - A Machine Learning Pipeline to Examine Political Bias with Congressional
Speeches [0.3062386594262859]
We give machine learning approaches to study political bias in two ideologically diverse social media forums: Gab and Twitter.
Our proposed methods exploit the use of transcripts collected from political speeches in US congress to label the data.
We also present a machine learning approach that combines features from cascades and text to forecast cascade's political bias with an accuracy of about 85%.
arXiv Detail & Related papers (2021-09-18T21:15:21Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.