Related papers: Identifying the sources of ideological bias in GPT models through linguistic variation in output

Identifying the sources of ideological bias in GPT models through linguistic variation in output

URL: http://arxiv.org/abs/2409.06043v1
Date: Mon, 9 Sep 2024 20:11:08 GMT
Title: Identifying the sources of ideological bias in GPT models through linguistic variation in output
Authors: Christina Walker, Joan C. Timoneda,
Abstract summary: We use linguistic variation in countries with contrasting political attitudes to evaluate bias in GPT responses to sensitive political topics. We find GPT output is more conservative in languages that map well onto conservative societies. differences across languages observed in GPT-3.5 persist in GPT-4, even though GPT-4 is significantly more liberal due to OpenAI's filtering policy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Extant work shows that generative AI models such as GPT-3.5 and 4 perpetuate social stereotypes and biases. One concerning but less explored source of bias is ideology. Do GPT models take ideological stances on politically sensitive topics? In this article, we provide an original approach to identifying ideological bias in generative models, showing that bias can stem from both the training data and the filtering algorithm. We leverage linguistic variation in countries with contrasting political attitudes to evaluate bias in average GPT responses to sensitive political topics in those languages. First, we find that GPT output is more conservative in languages that map well onto conservative societies (i.e., Polish), and more liberal in languages used uniquely in liberal societies (i.e., Swedish). This result provides strong evidence of training data bias in GPT models. Second, differences across languages observed in GPT-3.5 persist in GPT-4, even though GPT-4 is significantly more liberal due to OpenAI's filtering policy. Our main takeaway is that generative model training must focus on high-quality, curated datasets to reduce bias, even if it entails a compromise in training data size. Filtering responses after training only introduces new biases and does not remove the underlying training biases.

Related papers

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models [72.89977583150748]
We propose a novel methodology to assess how Large Language Models align with broader geopolitical value systems.<n>We find that LLMs generally favor democratic values and leaders, but exhibit increases favorability toward authoritarian figures when prompted in Mandarin.
arXiv Detail & Related papers (2025-06-15T07:52:07Z)
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models [52.00270888041742]
We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries.<n>Our findings show significant geopolitical biases, with models favoring specific national narratives.<n>Simple debiasing prompts had a limited effect on reducing these biases.
arXiv Detail & Related papers (2025-06-07T10:45:17Z)
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles [1.340487372205839]
Bias in news reporting significantly impacts public perception, particularly regarding crime, politics, and societal issues. Here, we introduce an AI-driven framework leveraging advanced large language models (LLMs), specifically GPT-4o, GPT-4o Mini, Gemini Pro, Gemini Flash, Llama 8B, and Llama 3B. Our approach employs a two-stage methodology: (1) bias detection, where each LLM scores and justifies biased content at the paragraph level, validated through human evaluation for ground truth establishment, and (2) iterative debiasing using GPT-4o Mini, verified by both automated reassessment and human reviewers.
arXiv Detail & Related papers (2025-04-04T15:17:53Z)
Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources. We develop a checklist comprising objective and subjective queries to analyze behavior of large language models. We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
Is GPT-4 Less Politically Biased than GPT-3.5? A Renewed Investigation of ChatGPT's Political Biases [0.0]
This work investigates the political biases and personality traits of ChatGPT, specifically comparing GPT-3.5 to GPT-4. The Political Compass Test and the Big Five Personality Test were employed 100 times for each scenario. The responses were analyzed by computing averages, standard deviations, and performing significance tests to investigate differences between GPT-3.5 and GPT-4. Correlations were found for traits that have been shown to be interdependent in human studies.
arXiv Detail & Related papers (2024-10-28T13:32:52Z)
From Lists to Emojis: How Format Bias Affects Model Alignment [67.08430328350327]
We study format biases in reinforcement learning from human feedback. Many widely-used preference models, including human evaluators, exhibit strong biases towards specific format patterns. We show that with a small amount of biased data, we can inject significant bias into the reward model.
arXiv Detail & Related papers (2024-09-18T05:13:18Z)
On the Relationship between Truth and Political Bias in Language Models [22.57096615768638]
We focus on analyzing the relationship between two concepts essential in both language model alignment and political science. We train reward models on various popular truthfulness datasets and evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias.
arXiv Detail & Related papers (2024-09-09T02:28:53Z)
LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains [0.0]
This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs.
arXiv Detail & Related papers (2024-07-19T14:28:07Z)
Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond? We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations. We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z)
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5. We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z)
Mitigating Political Bias in Language Models Through Reinforced Calibration [6.964628305312507]
We describe metrics for measuring political bias in GPT-2 generation. We propose a reinforcement learning (RL) framework for mitigating political biases in generated text.
arXiv Detail & Related papers (2021-04-30T07:21:30Z)
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models. We analyze the occupational biases of a popular generative language model, GPT-2. For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.