Related papers: Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing

Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing

URL: http://arxiv.org/abs/2406.07483v1
Date: Tue, 11 Jun 2024 17:26:07 GMT
Title: Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing
Authors: Mao Li, Frederick Conrad,
Abstract summary: Large Language Models (LLMs) for automated text annotation in social media posts has garnered significant interest. We analyze the performance of eight open-source and proprietary LLMs for annotating the stance expressed in social media posts. A significant finding of our study is that the explicitness of text expressing a stance plays a critical role in how faithfully LLMs' stance judgments match humans'
Score: 2.936331223824117
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the rapidly evolving landscape of Natural Language Processing (NLP), the use of Large Language Models (LLMs) for automated text annotation in social media posts has garnered significant interest. Despite the impressive innovations in developing LLMs like ChatGPT, their efficacy, and accuracy as annotation tools are not well understood. In this paper, we analyze the performance of eight open-source and proprietary LLMs for annotating the stance expressed in social media posts, benchmarking their performance against human annotators' (i.e., crowd-sourced) judgments. Additionally, we investigate the conditions under which LLMs are likely to disagree with human judgment. A significant finding of our study is that the explicitness of text expressing a stance plays a critical role in how faithfully LLMs' stance judgments match humans'. We argue that LLMs perform well when human annotators do, and when LLMs fail, it often corresponds to situations in which human annotators struggle to reach an agreement. We conclude with recommendations for a comprehensive approach that combines the precision of human expertise with the scalability of LLM predictions. This study highlights the importance of improving the accuracy and comprehensiveness of automated stance detection, aiming to advance these technologies for more efficient and unbiased analysis of social media.

Related papers

Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values [13.798198972161657]
A number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes.<n>This paper examines whether large language models (LLMs) adhere to fundamental fairness concepts and investigate their alignment with human preferences.
arXiv Detail & Related papers (2025-02-01T04:24:47Z)
Potential and Perils of Large Language Models as Judges of Unstructured Textual Data [0.631976908971572]
This research investigates the effectiveness of LLM-as-judge models to evaluate the thematic alignment of summaries generated by other LLMs. Our findings reveal that while LLM-as-judge offer a scalable solution comparable to human raters, humans may still excel at detecting subtle, context-specific nuances.
arXiv Detail & Related papers (2025-01-14T14:49:14Z)
Evaluating the Performance of Large Language Models in Scientific Claim Detection and Classification [0.0]
This study evaluates the efficacy of Large Language Models (LLMs) as innovative solutions for mitigating misinformation on platforms like Twitter. LLMs offer a pre-trained, adaptable approach that bypasses the extensive training and overfitting issues associated with traditional machine learning models. We present a comparative analysis of LLMs' performance using a specialized dataset and propose a framework for their application in public health communication.
arXiv Detail & Related papers (2024-12-21T05:02:26Z)
Robots in the Middle: Evaluating LLMs in Dispute Resolution [0.0]
We investigate whether large language models (LLMs) are able to analyze dispute conversations, select suitable intervention types, and generate appropriate intervention messages. Our results demonstrate the potential of integrating AI in online dispute resolution (ODR) platforms.
arXiv Detail & Related papers (2024-10-09T16:51:10Z)
The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks. This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z)
Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness [30.632260870411177]
Large language models (LLMs) have rapidly penetrated into people's work and daily lives over the past few years. This thesis focuses on the correctness, non-toxicity, and fairness of LLMs from both software testing and natural language processing perspectives.
arXiv Detail & Related papers (2024-08-31T22:21:04Z)
Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation [15.718288693929019]
Large Language Models (LLM) achieve state-of-the-art performance on many NLP tasks. We study whether LLMs can be used as substitutes for human annotators. We find that LLMs outperform current automatic measures for system-level evaluation but still struggle to provide satisfactory explanations.
arXiv Detail & Related papers (2024-05-22T15:56:52Z)
The Human Factor in Detecting Errors of Large Language Models: A Systematic Literature Review and Future Research Directions [0.0]
Launch of ChatGPT by OpenAI in November 2022 marked a pivotal moment for Artificial Intelligence. Large Language Models (LLMs) demonstrate remarkable conversational capabilities across various domains. These models are susceptible to errors - "hallucinations" and omissions, generating incorrect or incomplete information.
arXiv Detail & Related papers (2024-03-13T21:39:39Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information. This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z)
Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization [66.08074487429477]
We investigate the stability and reliability of large language models (LLMs) as automatic evaluators for abstractive summarization. We find that while ChatGPT and GPT-4 outperform the commonly used automatic metrics, they are not ready as human replacements.
arXiv Detail & Related papers (2023-05-22T14:58:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.