ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political
Twitter Messages with Zero-Shot Learning
- URL: http://arxiv.org/abs/2304.06588v1
- Date: Thu, 13 Apr 2023 14:51:40 GMT
- Title: ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political
Twitter Messages with Zero-Shot Learning
- Authors: Petter T\"ornberg
- Abstract summary: This paper assesses the accuracy, reliability and bias of the Large Language Model (LLM) ChatGPT-4 on the text analysis task of classifying the political affiliation of a Twitter poster based on the content of a tweet.
We use Twitter messages from United States politicians during the 2020 election, providing a ground truth against which to measure accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper assesses the accuracy, reliability and bias of the Large Language
Model (LLM) ChatGPT-4 on the text analysis task of classifying the political
affiliation of a Twitter poster based on the content of a tweet. The LLM is
compared to manual annotation by both expert classifiers and crowd workers,
generally considered the gold standard for such tasks. We use Twitter messages
from United States politicians during the 2020 election, providing a ground
truth against which to measure accuracy. The paper finds that ChatGPT-4 has
achieves higher accuracy, higher reliability, and equal or lower bias than the
human classifiers. The LLM is able to correctly annotate messages that require
reasoning on the basis of contextual knowledge, and inferences around the
author's intentions - traditionally seen as uniquely human abilities. These
findings suggest that LLM will have substantial impact on the use of textual
data in the social sciences, by enabling interpretive research at a scale.
Related papers
- Toeing the Party Line: Election Manifestos as a Key to Understand Political Discourse on Twitter [15.698347233120993]
We use hashtags as a signal to fine-tune text representations without the need for manual annotation.
We find that our method yields stable positioning reflective of manifesto positioning, both in scenarios with all tweets of candidates.
This indicates that it is possible to reliably analyze the relative positioning of actors forgoing manual annotation.
arXiv Detail & Related papers (2024-10-21T08:01:46Z) - Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing [2.936331223824117]
Large Language Models (LLMs) for automated text annotation in social media posts has garnered significant interest.
We analyze the performance of eight open-source and proprietary LLMs for annotating the stance expressed in social media posts.
A significant finding of our study is that the explicitness of text expressing a stance plays a critical role in how faithfully LLMs' stance judgments match humans'
arXiv Detail & Related papers (2024-06-11T17:26:07Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - What Evidence Do Language Models Find Convincing? [94.90663008214918]
We build a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts.
We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions.
Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important.
arXiv Detail & Related papers (2024-02-19T02:15:34Z) - Positioning Political Texts with Large Language Models by Asking and Averaging [0.0]
We ask an LLM where a tweet or a sentence of a political text stands on the focal dimension and take the average of the LLM responses to position political actors.
The correlations between the position estimates obtained with the best LLMs and benchmarks based on text coding by experts, crowdworkers, or roll call votes exceed.90.
Using instruction-tuned LLMs to position texts in policy and ideological spaces is fast, cost-efficient, reliable, and reproducible (in the case of open LLMs) even if the texts are short and written in different languages.
arXiv Detail & Related papers (2023-11-28T09:45:02Z) - The Perils & Promises of Fact-checking with Large Language Models [55.869584426820715]
Large Language Models (LLMs) are increasingly trusted to write academic papers, lawsuits, and news articles.
We evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions.
Our results show the enhanced prowess of LLMs when equipped with contextual information.
While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy.
arXiv Detail & Related papers (2023-10-20T14:49:47Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Tweets2Stance: Users stance detection exploiting Zero-Shot Learning
Algorithms on Tweets [0.06372261626436675]
The aim of the study is to predict the stance of a Party p in regard to each statement s exploiting what the Twitter Party account wrote on Twitter.
Results obtained from multiple experiments show that Tweets2Stance can correctly predict the stance with a general minimum MAE of 1.13, which is a great achievement considering the task complexity.
arXiv Detail & Related papers (2022-04-22T14:00:11Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.