LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains
- URL: http://arxiv.org/abs/2407.14344v2
- Date: Tue, 22 Oct 2024 16:59:12 GMT
- Title: LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains
- Authors: Raphael Hernandes, Giulio Corsi,
- Abstract summary: This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs. Given the subjective nature of political labels, third-party bias ratings like those from Ad Fontes Media, AllSides, and Media Bias/Fact Check (MBFC) are often used in research to analyze news source diversity. This study aims to determine if GPT-4 can replicate these human ratings on a seven-degree scale ("far-left" to "far-right"). The analysis compares GPT-4's classifications against MBFC's, and controls for website popularity using Open PageRank scores. Findings reveal a high correlation ($\text{Spearman's } \rho = .89$, $n = 5,877$, $p < 0.001$) between GPT-4's and MBFC's ratings, indicating the model's potential reliability. However, GPT-4 abstained from classifying approximately $\frac{2}{3}$ of the dataset. It is more likely to abstain from rating unpopular websites, which also suffer from less accurate assessments. The LLM tends to avoid classifying sources that MBFC considers to be centrist, resulting in more polarized outputs. Finally, this analysis shows a slight leftward skew in GPT's classifications compared to MBFC's. Therefore, while this paper suggests that while GPT-4 can be a scalable, cost-effective tool for political bias classification of news websites, its use should be as a complement to human judgment to mitigate biases.
Related papers
- Is GPT-4 Less Politically Biased than GPT-3.5? A Renewed Investigation of ChatGPT's Political Biases [0.0]
This work investigates the political biases and personality traits of ChatGPT, specifically comparing GPT-3.5 to GPT-4.
The Political Compass Test and the Big Five Personality Test were employed 100 times for each scenario.
The responses were analyzed by computing averages, standard deviations, and performing significance tests to investigate differences between GPT-3.5 and GPT-4.
Correlations were found for traits that have been shown to be interdependent in human studies.
arXiv Detail & Related papers (2024-10-28T13:32:52Z) - Identifying the sources of ideological bias in GPT models through linguistic variation in output [0.0]
We use linguistic variation in countries with contrasting political attitudes to evaluate bias in GPT responses to sensitive political topics.
We find GPT output is more conservative in languages that map well onto conservative societies.
differences across languages observed in GPT-3.5 persist in GPT-4, even though GPT-4 is significantly more liberal due to OpenAI's filtering policy.
arXiv Detail & Related papers (2024-09-09T20:11:08Z) - Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games [56.70628673595041]
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic decision-making abilities remain largely unexplored.
This work investigates the performance and merits of LLMs in canonical game-theoretic two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma.
Our structured evaluation of GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B shows that these models, when making decisions in these games, are affected by at least one of the following systematic biases.
arXiv Detail & Related papers (2024-07-05T12:30:02Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses.
We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - Behind the Screen: Investigating ChatGPT's Dark Personality Traits and
Conspiracy Beliefs [0.0]
This paper analyzes the dark personality traits and conspiracy beliefs of GPT-3.5 and GPT-4.
Dark personality traits and conspiracy beliefs were not particularly pronounced in either model.
arXiv Detail & Related papers (2024-02-06T16:03:57Z) - Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings [63.35165397320137]
This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4.
The model rated responses to tasks within the Higher Education subject domain of macroeconomics in terms of their content and style.
arXiv Detail & Related papers (2023-08-03T12:47:17Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains.
We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4.
Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z) - ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political
Twitter Messages with Zero-Shot Learning [0.0]
This paper assesses the accuracy, reliability and bias of the Large Language Model (LLM) ChatGPT-4 on the text analysis task of classifying the political affiliation of a Twitter poster based on the content of a tweet.
We use Twitter messages from United States politicians during the 2020 election, providing a ground truth against which to measure accuracy.
arXiv Detail & Related papers (2023-04-13T14:51:40Z) - Accuracy and Political Bias of News Source Credibility Ratings by Large Language Models [8.367075755850983]
This paper audits eight widely used language models (LLMs) from three major providers to evaluate their ability to discern credible and high-quality information sources.
We find that larger models more frequently refuse to provide ratings due to insufficient information, whereas smaller models are more prone to hallucination in their ratings.
arXiv Detail & Related papers (2023-04-01T05:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.