Related papers: On the Relationship between Truth and Political Bias in Language Models

On the Relationship between Truth and Political Bias in Language Models

URL: http://arxiv.org/abs/2409.05283v2
Date: Fri, 11 Oct 2024 20:10:53 GMT
Title: On the Relationship between Truth and Political Bias in Language Models
Authors: Suyash Fulay, William Brannon, Shrestha Mohanty, Cassandra Overney, Elinor Poole-Dayan, Deb Roy, Jad Kabbara,
Abstract summary: We focus on analyzing the relationship between two concepts essential in both language model alignment and political science. We train reward models on various popular truthfulness datasets and evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias.
Score: 22.57096615768638
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language model alignment research often attempts to ensure that models are not only helpful and harmless, but also truthful and unbiased. However, optimizing these objectives simultaneously can obscure how improving one aspect might impact the others. In this work, we focus on analyzing the relationship between two concepts essential in both language model alignment and political science: truthfulness and political bias. We train reward models on various popular truthfulness datasets and subsequently evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias. We also find that existing open-source reward models (i.e., those trained on standard human preference datasets) already show a similar bias and that the bias is larger for larger models. These results raise important questions about the datasets used to represent truthfulness, potential limitations of aligning models to be both truthful and politically unbiased, and what language models capture about the relationship between truth and politics.

Related papers

Language-Dependent Political Bias in AI: A Study of ChatGPT and Gemini [0.0]
This study investigates the political tendency of large language models and the existence of differentiation according to the query language. ChatGPT and Gemini were subjected to a political axis test using 14 different languages. A comparative analysis revealed that Gemini exhibited a more pronounced liberal and left-wing tendency compared to ChatGPT.
arXiv Detail & Related papers (2025-04-08T21:13:01Z)
Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models [4.8869340671593475]
Political bias in prompt-based language models can affect their performance. We build on survey design principles to test a wide variety of input prompts, while taking into account prompt sensitivity. We compute political bias profiles across different prompt variations and find that measures of political bias are often unstable.
arXiv Detail & Related papers (2025-03-20T13:51:06Z)
BiasConnect: Investigating Bias Interactions in Text-to-Image Models [73.76853483463836]
We introduce BiasConnect, a novel tool designed to analyze and quantify bias interactions in Text-to-Image models. Our method provides empirical estimates that indicate how other bias dimensions shift toward or away from an ideal distribution when a given bias is modified. We demonstrate the utility of BiasConnect for selecting optimal bias mitigation axes, comparing different TTI models on the dependencies they learn, and understanding the amplification of intersectional societal biases in TTI models.
arXiv Detail & Related papers (2025-03-12T19:01:41Z)
Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources. We develop a checklist comprising objective and subjective queries to analyze behavior of large language models. We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
Balancing Transparency and Accuracy: A Comparative Analysis of Rule-Based and Deep Learning Models in Political Bias Classification [5.550237524713089]
The study highlights the sensitivity of modern self-learning systems to unconstrained data ingestion. Applying both models to left-leaning (CNN) and right-leaning (FOX) news articles, we assess their effectiveness on data beyond the original training and test sets. We contrast the opaque architecture of a deep learning model with the transparency of a linguistically informed rule-based model.
arXiv Detail & Related papers (2024-11-07T00:09:18Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
High Risk of Political Bias in Black Box Emotion Inference Models [0.0]
This paper investigates the presence of political bias in machine learning models used for sentiment analysis (SA) in social science research. We conducted a bias audit on a Polish sentiment analysis model developed in our lab. Our findings indicate that annotations by human raters propagate political biases into the model's predictions.
arXiv Detail & Related papers (2024-07-18T20:31:07Z)
Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
It's All Relative: Interpretable Models for Scoring Bias in Documents [10.678219157857946]
We propose an interpretable model to score the bias present in web documents, based only on their textual content. Our model incorporates assumptions reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of the same Wikipedia article. We show that we can interpret the parameters of the trained model to discover the words most indicative of bias.
arXiv Detail & Related papers (2023-07-16T19:35:38Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Mitigating Political Bias in Language Models Through Reinforced Calibration [6.964628305312507]
We describe metrics for measuring political bias in GPT-2 generation. We propose a reinforcement learning (RL) framework for mitigating political biases in generated text.
arXiv Detail & Related papers (2021-04-30T07:21:30Z)
Learning from others' mistakes: Avoiding dataset biases without modeling them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
Inflating Topic Relevance with Ideology: A Case Study of Political Ideology Bias in Social Topic Detection Models [16.279854003220418]
We investigate the impact of political ideology biases in training data. Our work highlights the susceptibility of large, complex models to propagating the biases from human-selected input. As a way to mitigate the bias, we propose to learn a text representation that is invariant to political ideology while still judging topic relevance.
arXiv Detail & Related papers (2020-11-29T05:54:03Z)
Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.