Related papers: The 17% Gap: Quantifying Epistemic Decay in AI-Assisted Survey Papers

The 17% Gap: Quantifying Epistemic Decay in AI-Assisted Survey Papers

URL: http://arxiv.org/abs/2601.17431v1
Date: Sat, 24 Jan 2026 12:00:55 GMT
Title: The 17% Gap: Quantifying Epistemic Decay in AI-Assisted Survey Papers
Authors: H. Kemal İlter,
Abstract summary: "Hallucinated papers" are a known artifact, but the systematic degradation of valid citation chains remains unquantified.<n>We conducted a forensic audit of 50 recent survey papers in Artificial Intelligence published between September 2024 and January 2026.<n>We detect a persistent 17.0% Phantom Rate -- citations that cannot be resolved to any digital object despite aggressive forensic recovery.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The adoption of Large Language Models (LLMs) in scientific writing promises efficiency but risks introducing informational entropy. While "hallucinated papers" are a known artifact, the systematic degradation of valid citation chains remains unquantified. We conducted a forensic audit of 50 recent survey papers in Artificial Intelligence (N=5,514 citations) published between September 2024 and January 2026. We utilized a hybrid verification pipeline combining DOI resolution, Crossref metadata analysis, Semantic Scholar queries, and fuzzy text matching to distinguish between formatting errors ("Sloppiness") and verifiable non-existence ("Phantoms). We detect a persistent 17.0% Phantom Rate -- citations that cannot be resolved to any digital object despite aggressive forensic recovery. Diagnostic categorization reveals three distinct failure modes: pure hallucinations (5.1%), hallucinated identifiers with valid titles (16.4%), and parsing-induced matching failures (78.5%). Longitudinal analysis reveals a flat trend (+0.07 pp/month), suggesting that high-entropy citation practices have stabilized as an endemic feature of the field. The scientific citation graph in AI survey literature exhibits "link rot" at scale. This suggests a mechanism where AI tools act as "lazy research assistants," retrieving correct titles but hallucinating metadata, thereby severing the digital chain of custody required for reproducible science.

Related papers

CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era [51.63024682584688]
Large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications.<n>We present the first comprehensive benchmark and detection framework for hallucinated citations in scientific writing.<n>Our framework significantly outperforms prior methods in both accuracy and interpretability.
arXiv Detail & Related papers (2026-02-26T19:17:39Z)
Identifying Evidence-Based Nudges in Biomedical Literature with Large Language Models [2.2015514798912412]
We present a scalable, AI-powered system that identifies and extracts evidence-based behavioral nudges from unstructured biomedical literature.<n>Nudges are subtle, non-coercive interventions that influence behavior without limiting choice, showing strong impact on health outcomes like medication adherence.
arXiv Detail & Related papers (2026-02-10T22:36:07Z)
GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models [22.147294042024836]
Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses.<n>With the advent of Large Language Models (LLMs), this risk has intensified.<n>We develop CiteVerifier, an open-source framework for large-scale citation verification.
arXiv Detail & Related papers (2026-02-06T14:08:34Z)
Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025 [0.0]
Large language models (LLMs) are increasingly used in academic writing, yet they frequently hallucinate by generating citations to sources that do not exist.<n>This study analyzes 100 AI-generated hallucinated citations that appeared in papers accepted by the 2025 Conference on Neural Information Processing Systems.<n>Despite review by 3-5 expert researchers per paper, these fabricated citations evaded detection, appearing in 53 published papers.
arXiv Detail & Related papers (2026-02-05T17:43:35Z)
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature [92.88058660627678]
"Fish-in-the-Ocean" (FITO) paradigm requires models to construct explicit cross-modal evidence chains within scientific documents.<n>We construct SIN-Bench with four progressive tasks covering evidence discovery (SIN-Find), hypothesis verification (SIN-Verify), grounded QA (SIN-QA) and evidence-anchored synthesis (SIN-Summary)<n>We introduce "No Evidence, No Score", scoring predictions when grounded to verifiable anchors and diagnosing evidence quality via matching, relevance, and logic.
arXiv Detail & Related papers (2026-01-15T06:25:25Z)
Causal-Enhanced AI Agents for Medical Research Screening [0.0]
Systematic reviews are essential for evidence-based medicine, but reviewing 1.5 million+ annual publications manually is infeasible.<n>We present a causal graph-enhanced retrieval-augmented generation system integrating explicit causal reasoning with dual-level knowledge graphs.<n>Our approach enforces evidence-first protocols where every causal claim traces to retrieved literature and automatically generates directed acyclic graphs visualizing intervention-outcome pathways.
arXiv Detail & Related papers (2026-01-06T08:41:16Z)
The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems [0.0]
We apply hallucination prediction to RAG detection, transforming scores into decision sets with finite-sample coverage guarantees.<n>We analyze this failure through the lens of distributional tails, showing that while NLI models achieve acceptable AUC (0.81), the "hardest" hallucinations are semantically indistinguishable from faithful responses.
arXiv Detail & Related papers (2025-12-17T04:22:28Z)
HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z)
THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning? [16.91455372359864]
We introduce textbfTHE-Tree (textbfTechnology textbfHistory textbfEvolution Tree), a computational framework that constructs such domain-specific evolution trees from scientific literature.
arXiv Detail & Related papers (2025-06-26T20:44:51Z)
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools [32.78336381381673]
We report on the first preregistered empirical evaluation of AI-driven legal research tools. We find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time. It provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs.
arXiv Detail & Related papers (2024-05-30T17:56:05Z)
Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations. We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms. We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z)
AI Hallucinations: A Misnomer Worth Clarifying [4.880243880711163]
We present and analyze definitions obtained across all databases, categorize them based on their applications, and extract key points within each category. Our results highlight a lack of consistency in how the term is used, but also help identify several alternative terms in the literature.
arXiv Detail & Related papers (2024-01-09T01:49:41Z)
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation [76.34411067299331]
Large language models often tend to 'hallucinate' which critically hampers their reliability. We propose an approach that actively detects and mitigates hallucinations during the generation process. We show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average.
arXiv Detail & Related papers (2023-07-08T14:25:57Z)
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search [54.286450484332505]
We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source. We present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
arXiv Detail & Related papers (2022-03-16T07:13:52Z)
Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks. Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.