Related papers: Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models

Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models

URL: http://arxiv.org/abs/2503.22877v1
Date: Fri, 28 Mar 2025 21:07:43 GMT
Title: Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models
Authors: Bruno Coelho, Shujaat Mirza, Yuyuan Cui, Christina Pöpper, Damon McCoy,
Abstract summary: We evaluate the factual accuracy of open and private models across a diverse set of regions and scenarios.<n>Our findings reveal that regardless of the scenario and LLM used, statements from the Global North perform substantially better than those from the Global South.
Score: 7.604241782666465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fact-checking is a potentially useful application of Large Language Models (LLMs) to combat the growing dissemination of disinformation. However, the performance of LLMs varies across geographic regions. In this paper, we evaluate the factual accuracy of open and private models across a diverse set of regions and scenarios. Using a dataset containing 600 fact-checked statements balanced across six global regions we examine three experimental setups of fact-checking a statement: (1) when just the statement is available, (2) when an LLM-based agent with Wikipedia access is utilized, and (3) as a best case scenario when a Retrieval-Augmented Generation (RAG) system provided with the official fact check is employed. Our findings reveal that regardless of the scenario and LLM used, including GPT-4, Claude Sonnet, and LLaMA, statements from the Global North perform substantially better than those from the Global South. Furthermore, this gap is broadened for the more realistic case of a Wikipedia agent-based system, highlighting that overly general knowledge bases have a limited ability to address region-specific nuances. These results underscore the urgent need for better dataset balancing and robust retrieval strategies to enhance LLM fact-checking capabilities, particularly in geographically diverse contexts.

Related papers

Towards Automated Fact-Checking of Real-World Claims: Exploring Task Formulation and Assessment with LLMs [32.45604456988931]
This study establishes baseline comparisons for Automated Fact-Checking (AFC) using Large Language Models (LLMs) We evaluate Llama-3 models of varying sizes on 17,856 claims collected from PolitiFact (2007-2024) using evidence retrieved via restricted web searches. Our results show that larger LLMs consistently outperform smaller LLMs in classification accuracy and justification quality without fine-tuning.
arXiv Detail & Related papers (2025-02-13T02:51:17Z)
What can LLM tell us about cities? [6.405546719612814]
This study explores the capabilities of large language models (LLMs) in providing knowledge about cities and regions on a global scale. Experiments reveal that LLMs embed a broad but varying degree of knowledge across global cities, with ML models trained on LLM-derived features consistently leading to improved predictive accuracy.
arXiv Detail & Related papers (2024-11-25T09:07:56Z)
GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
Knowing When to Ask -- Bridging Large Language Models and Data [3.111987311375933]
Large Language Models (LLMs) are prone to generating factually incorrect information when responding to queries that involve numerical and statistical data or other timely facts. We present an approach for enhancing the accuracy of LLMs by integrating them with Data Commons.
arXiv Detail & Related papers (2024-09-10T17:51:21Z)
Large Language Models are Zero-Shot Next Location Predictors [4.315451628809687]
Large Language Models (LLMs) have shown good generalization and reasoning capabilities. LLMs can obtain accuracies up to 36.2%, a significant improvement of almost 640% when compared to other models specifically designed for human mobility.
arXiv Detail & Related papers (2024-05-31T16:07:33Z)
DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection [50.805599761583444]
Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles. We propose Dell that identifies three key stages in misinformation detection where LLMs could be incorporated as part of the pipeline.
arXiv Detail & Related papers (2024-02-16T03:24:56Z)
Global-Liar: Factuality of LLMs over Time and Geographic Regions [3.715487408753612]
This study evaluates the factual accuracy, stability, and biases in widely adopted GPT models, including GPT-3.5 and GPT-4. We introduce 'Global-Liar,' a dataset uniquely balanced in terms of geographic and temporal representation.
arXiv Detail & Related papers (2024-01-31T13:57:24Z)
Distortions in Judged Spatial Relations in Large Language Models [45.875801135769585]
GPT-4 exhibited superior performance with 55 percent accuracy, followed by GPT-3.5 at 47 percent, and Llama-2 at 45 percent. The models identified the nearest cardinal direction in most cases, reflecting their associative learning mechanism.
arXiv Detail & Related papers (2024-01-08T20:08:04Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking. This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z)
LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language Models [13.659853119356507]
Large Language Models (LLMs) have revolutionized natural language processing and demonstrated impressive capabilities in various tasks. They are prone to hallucinations, where the model exposes incorrect or false information in its responses. We propose LLMMaps as a novel visualization technique that enables users to evaluate LLMs' performance with respect to Q&A datasets.
arXiv Detail & Related papers (2023-04-02T05:47:09Z)
Jalisco's multiclass land cover analysis and classification using a novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis. In this work, we combine three real-world open data sources to obtain 13 channels. Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.