ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political
Twitter Messages with Zero-Shot Learning
- URL: http://arxiv.org/abs/2304.06588v1
- Date: Thu, 13 Apr 2023 14:51:40 GMT
- Title: ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political
Twitter Messages with Zero-Shot Learning
- Authors: Petter T\"ornberg
- Abstract summary: This paper assesses the accuracy, reliability and bias of the Large Language Model (LLM) ChatGPT-4 on the text analysis task of classifying the political affiliation of a Twitter poster based on the content of a tweet.
We use Twitter messages from United States politicians during the 2020 election, providing a ground truth against which to measure accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper assesses the accuracy, reliability and bias of the Large Language
Model (LLM) ChatGPT-4 on the text analysis task of classifying the political
affiliation of a Twitter poster based on the content of a tweet. The LLM is
compared to manual annotation by both expert classifiers and crowd workers,
generally considered the gold standard for such tasks. We use Twitter messages
from United States politicians during the 2020 election, providing a ground
truth against which to measure accuracy. The paper finds that ChatGPT-4 has
achieves higher accuracy, higher reliability, and equal or lower bias than the
human classifiers. The LLM is able to correctly annotate messages that require
reasoning on the basis of contextual knowledge, and inferences around the
author's intentions - traditionally seen as uniquely human abilities. These
findings suggest that LLM will have substantial impact on the use of textual
data in the social sciences, by enabling interpretive research at a scale.
Related papers
- Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing [2.936331223824117]
Large Language Models (LLMs) for automated text annotation in social media posts has garnered significant interest.
We analyze the performance of eight open-source and proprietary LLMs for annotating the stance expressed in social media posts.
A significant finding of our study is that the explicitness of text expressing a stance plays a critical role in how faithfully LLMs' stance judgments match humans'
arXiv Detail & Related papers (2024-06-11T17:26:07Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Language agency is an important aspect of evaluating social biases in texts.
Previous research often relies on string-matching techniques to identify agentic and communal words.
We introduce the novel Language Agency Bias Evaluation benchmark.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Whose Side Are You On? Investigating the Political Stance of Large Language Models [56.883423489203786]
We investigate the political orientation of Large Language Models (LLMs) across a spectrum of eight polarizing topics.
Our investigation delves into the political alignment of LLMs across a spectrum of eight polarizing topics, spanning from abortion to LGBTQ issues.
The findings suggest that users should be mindful when crafting queries, and exercise caution in selecting neutral prompt language.
arXiv Detail & Related papers (2024-03-15T04:02:24Z) - What Evidence Do Language Models Find Convincing? [103.67867531892988]
We build a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts.
We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions.
Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important.
arXiv Detail & Related papers (2024-02-19T02:15:34Z) - Scaling Political Texts with Large Language Models: Asking a Chatbot Might Be All You Need [0.0]
We use instruction-tuned Large Language Models (LLMs) to position political texts within policy and ideological spaces.
We illustrate and validate the approach by scaling British party manifestos on the economic, social, and immigration policy dimensions.
The correlation between the position estimates obtained with the best LLMs and benchmarks based on coding by experts, crowdworkers or roll call votes exceeds.90.
arXiv Detail & Related papers (2023-11-28T09:45:02Z) - The Perils & Promises of Fact-checking with Large Language Models [55.869584426820715]
Large Language Models (LLMs) are increasingly trusted to write academic papers, lawsuits, and news articles.
We evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions.
Our results show the enhanced prowess of LLMs when equipped with contextual information.
While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy.
arXiv Detail & Related papers (2023-10-20T14:49:47Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Large language models can rate news outlet credibility [6.147741269183294]
Large language models (LLMs) have shown exceptional performance in various natural language processing tasks.
Here we assess whether ChatGPT, a prominent LLM, can evaluate the credibility of news outlets.
Our results show that these ratings correlate with those from human experts.
arXiv Detail & Related papers (2023-04-01T05:04:06Z) - Tweets2Stance: Users stance detection exploiting Zero-Shot Learning
Algorithms on Tweets [0.06372261626436675]
The aim of the study is to predict the stance of a Party p in regard to each statement s exploiting what the Twitter Party account wrote on Twitter.
Results obtained from multiple experiments show that Tweets2Stance can correctly predict the stance with a general minimum MAE of 1.13, which is a great achievement considering the task complexity.
arXiv Detail & Related papers (2022-04-22T14:00:11Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.