Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking
- URL: http://arxiv.org/abs/2504.19940v1
- Date: Thu, 24 Apr 2025 18:49:55 GMT
- Title: Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking
- Authors: Luigia Costabile, Gian Marco Orlando, Valerio La Gatta, Vincenzo Moscato,
- Abstract summary: Large Language Models (LLMs) have shown strong performance across fact-checking tasks.<n>This paper investigates whether generative agents can meaningfully contribute to fact-checking tasks traditionally reserved for human crowds.<n>Agent crowds outperform human crowds in truthfulness classification, exhibit higher internal consistency, and show reduced susceptibility to social and cognitive biases.
- Score: 7.946359845249688
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The growing spread of online misinformation has created an urgent need for scalable, reliable fact-checking solutions. Crowdsourced fact-checking - where non-experts evaluate claim veracity - offers a cost-effective alternative to expert verification, despite concerns about variability in quality and bias. Encouraged by promising results in certain contexts, major platforms such as X (formerly Twitter), Facebook, and Instagram have begun shifting from centralized moderation to decentralized, crowd-based approaches. In parallel, advances in Large Language Models (LLMs) have shown strong performance across core fact-checking tasks, including claim detection and evidence evaluation. However, their potential role in crowdsourced workflows remains unexplored. This paper investigates whether LLM-powered generative agents - autonomous entities that emulate human behavior and decision-making - can meaningfully contribute to fact-checking tasks traditionally reserved for human crowds. Using the protocol of La Barbera et al. (2024), we simulate crowds of generative agents with diverse demographic and ideological profiles. Agents retrieve evidence, assess claims along multiple quality dimensions, and issue final veracity judgments. Our results show that agent crowds outperform human crowds in truthfulness classification, exhibit higher internal consistency, and show reduced susceptibility to social and cognitive biases. Compared to humans, agents rely more systematically on informative criteria such as Accuracy, Precision, and Informativeness, suggesting a more structured decision-making process. Overall, our findings highlight the potential of generative agents as scalable, consistent, and less biased contributors to crowd-based fact-checking systems.
Related papers
- MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions.
We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users.
Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z) - AI Agents That Matter [11.794931453828974]
AI agents are an exciting new research direction, and agent development is driven by benchmarks.
There is a narrow focus on accuracy without attention to other metrics.
benchmarking needs of model and downstream developers have been conflated.
Many agent benchmarks have inadequate holdout sets, and sometimes none at all.
arXiv Detail & Related papers (2024-07-01T17:48:14Z) - How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models [95.44559524735308]
Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content.
We test the limits of improving foundation model performance without continual updating through an initial study of knowledge transfer.
Our results on two recent multi-modal fact-checking benchmarks, Mocheg and Fakeddit, indicate that knowledge transfer strategies can improve Fakeddit performance over the state-of-the-art by up to 1.7% and Mocheg performance by up to 2.9%.
arXiv Detail & Related papers (2024-06-29T08:39:07Z) - Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News [4.413331329339185]
We study the influence these biases can have in the pervasive problem of fake news by evaluating human participants' capacity to identify false headlines.
By focusing on headlines involving sensitive characteristics, we gather a comprehensive dataset to explore how human responses are shaped by their biases.
We show that demographic factors, headline categories, and the manner in which information is presented significantly influence errors in human judgment.
arXiv Detail & Related papers (2024-03-11T12:08:08Z) - How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation [46.42384207122049]
We design SimulateBench to evaluate the believability of large language models (LLMs) when simulating human behaviors.
Based on SimulateBench, we evaluate the performances of 10 widely used LLMs when simulating characters.
arXiv Detail & Related papers (2023-12-28T16:51:11Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - Leveraging Clickstream Trajectories to Reveal Low-Quality Workers in
Crowdsourced Forecasting Platforms [22.995941896769843]
We propose the use of a computational framework to identify clusters of underperforming workers using clickstream trajectories.
The framework can reveal different types of underperformers, such as workers with forecasts whose accuracy is far from the consensus of the crowd.
Our study suggests that clickstream clustering and analysis are fundamental tools to diagnose the performance of crowdworkers in platforms leveraging the wisdom of crowds.
arXiv Detail & Related papers (2020-09-04T00:26:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.