Identifying the sources of ideological bias in GPT models through linguistic variation in output
- URL: http://arxiv.org/abs/2409.06043v1
- Date: Mon, 9 Sep 2024 20:11:08 GMT
- Title: Identifying the sources of ideological bias in GPT models through linguistic variation in output
- Authors: Christina Walker, Joan C. Timoneda,
- Abstract summary: We use linguistic variation in countries with contrasting political attitudes to evaluate bias in GPT responses to sensitive political topics.
We find GPT output is more conservative in languages that map well onto conservative societies.
differences across languages observed in GPT-3.5 persist in GPT-4, even though GPT-4 is significantly more liberal due to OpenAI's filtering policy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extant work shows that generative AI models such as GPT-3.5 and 4 perpetuate social stereotypes and biases. One concerning but less explored source of bias is ideology. Do GPT models take ideological stances on politically sensitive topics? In this article, we provide an original approach to identifying ideological bias in generative models, showing that bias can stem from both the training data and the filtering algorithm. We leverage linguistic variation in countries with contrasting political attitudes to evaluate bias in average GPT responses to sensitive political topics in those languages. First, we find that GPT output is more conservative in languages that map well onto conservative societies (i.e., Polish), and more liberal in languages used uniquely in liberal societies (i.e., Swedish). This result provides strong evidence of training data bias in GPT models. Second, differences across languages observed in GPT-3.5 persist in GPT-4, even though GPT-4 is significantly more liberal due to OpenAI's filtering policy. Our main takeaway is that generative model training must focus on high-quality, curated datasets to reduce bias, even if it entails a compromise in training data size. Filtering responses after training only introduces new biases and does not remove the underlying training biases.
Related papers
- Is GPT-4 Less Politically Biased than GPT-3.5? A Renewed Investigation of ChatGPT's Political Biases [0.0]
This work investigates the political biases and personality traits of ChatGPT, specifically comparing GPT-3.5 to GPT-4.
The Political Compass Test and the Big Five Personality Test were employed 100 times for each scenario.
The responses were analyzed by computing averages, standard deviations, and performing significance tests to investigate differences between GPT-3.5 and GPT-4.
Correlations were found for traits that have been shown to be interdependent in human studies.
arXiv Detail & Related papers (2024-10-28T13:32:52Z) - From Lists to Emojis: How Format Bias Affects Model Alignment [67.08430328350327]
We study format biases in reinforcement learning from human feedback.
Many widely-used preference models, including human evaluators, exhibit strong biases towards specific format patterns.
We show that with a small amount of biased data, we can inject significant bias into the reward model.
arXiv Detail & Related papers (2024-09-18T05:13:18Z) - On the Relationship between Truth and Political Bias in Language Models [22.57096615768638]
We focus on analyzing the relationship between two concepts essential in both language model alignment and political science.
We train reward models on various popular truthfulness datasets and evaluate their political bias.
Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias.
arXiv Detail & Related papers (2024-09-09T02:28:53Z) - LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains [0.0]
This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs.
arXiv Detail & Related papers (2024-07-19T14:28:07Z) - Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models.
Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses.
We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - Mitigating Political Bias in Language Models Through Reinforced
Calibration [6.964628305312507]
We describe metrics for measuring political bias in GPT-2 generation.
We propose a reinforcement learning (RL) framework for mitigating political biases in generated text.
arXiv Detail & Related papers (2021-04-30T07:21:30Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.