Related papers: Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews

Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews

URL: http://arxiv.org/abs/2602.22221v1
Date: Mon, 15 Dec 2025 15:36:13 GMT
Title: Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews
Authors: Geng Liu, Junjie Mu, Li Feng, Mengxiao Zhu, Francesco Pierri,
Abstract summary: Large Language Models (LLMs) are increasingly integrated into search services, providing direct answers.<n>Yet their factual reliability in non-English web ecosystems remains poorly understood.<n>We introduce a fact-checking dataset of 12161 Chinese Yes/No questions derived from real-world online search logs.
Score: 5.655762029601206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly integrated into search services, providing direct answers that can reduce users' reliance on traditional result pages. Yet their factual reliability in non-English web ecosystems remains poorly understood, particularly when answering real user queries. We introduce a fact-checking dataset of 12~161 Chinese Yes/No questions derived from real-world online search logs and develop a unified evaluation pipeline to compare three information-access paradigms: traditional search engines, standalone LLMs, and AI-generated overview modules. Our analysis reveals substantial differences in factual accuracy and topic-level variability across systems. By combining this performance with real-world Baidu Index statistics, we further estimate potential exposure to incorrect factual information of Chinese users across regions. These findings highlight structural risks in AI-mediated search and underscore the need for more reliable and transparent information-access tools for the digital world.

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
GISA: A Benchmark for General Information-Seeking Assistant [102.30831921333755]
GISA is a benchmark for General Information-Seeking Assistants comprising 373 human-crafted queries.<n>It integrates both deep reasoning and broad information aggregation within unified tasks, and includes a live subset with periodically updated answers to resist memorization.<n>Experiments on mainstream LLMs and commercial search products reveal that even the best-performing model achieves only 19.30% exact match score.
arXiv Detail & Related papers (2026-02-09T11:44:15Z)
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z)
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark [50.89916747049978]
Existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope.<n>We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data.<n>We propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs.
arXiv Detail & Related papers (2025-06-12T17:57:05Z)
Search Arena: Analyzing Search-Augmented LLMs [61.28673331156436]
We introduce Search Arena, a crowd-sourced, large-scale, human-preference dataset of over 24,000 paired multi-turn user interactions.<n>The dataset spans diverse intents and languages, and contains full system traces with around 12,000 human preference votes.<n>Our analysis reveals that user preferences are influenced by the number of citations, even when the cited content does not directly support the attributed claims.
arXiv Detail & Related papers (2025-06-05T17:59:26Z)
Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval [0.0]
This study introduces a novel approach to bias-aware knowledge retrieval by leveraging agentic framework and the innovative use of bias detectors.<n>By empowering users with transparency and awareness, this approach aims to foster more equitable information systems.
arXiv Detail & Related papers (2025-03-27T07:54:39Z)
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models [38.921977141721605]
We introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA.<n>Key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers.
arXiv Detail & Related papers (2025-02-17T12:02:23Z)
Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents [9.003325286793288]
Large language models (LLMs), adopted to understand human language, drive the development of artificial intelligence (AI) web search agents.<n>We propose a general-purpose and training-free web search agent by level-aware navigation, Level-Navi Agent, accompanied by a well-annotated dataset (Web24) and a suitable evaluation metric.
arXiv Detail & Related papers (2024-12-20T08:03:12Z)
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models [55.903148392998965]
We introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities.<n>The benchmark includes coarse-grained judgment and multiple-choice questions, as well as fine-grained anomaly selection and explanation tasks.<n>We evaluate 22 open-source LMMs and 6 closed-source models on LOKI, highlighting their potential as synthetic data detectors and also revealing some limitations in the development of LMM capabilities.
arXiv Detail & Related papers (2024-10-13T05:26:36Z)
Neurosymbolic AI approach to Attribution in Large Language Models [5.3454230926797734]
Neurosymbolic AI (NesyAI) combines the strengths of neural networks with structured symbolic reasoning. This paper explores how NesyAI frameworks can enhance existing attribution models, offering more reliable, interpretable, and adaptable systems.
arXiv Detail & Related papers (2024-09-30T02:20:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.