Citation Failure: Definition, Analysis and Efficient Mitigation
- URL: http://arxiv.org/abs/2510.20303v1
- Date: Thu, 23 Oct 2025 07:47:22 GMT
- Title: Citation Failure: Definition, Analysis and Efficient Mitigation
- Authors: Jan Buchmann, Iryna Gurevych,
- Abstract summary: Citations from LLM-based RAG systems are supposed to simplify response verification.<n>This does not hold for citation failure, when a model generates a helpful response, but fails to cite complete evidence.<n>We propose to disentangle this from response failure, where the response itself is flawed, and citing complete evidence is impossible.
- Score: 56.09968229868067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Citations from LLM-based RAG systems are supposed to simplify response verification. However, this does not hold for citation failure, when a model generates a helpful response, but fails to cite complete evidence. In contrast to previous work, we propose to disentangle this from response failure, where the response itself is flawed, and citing complete evidence is impossible. To address citation failure, this work follows a two-step approach: (1) We study when citation failure occurs and (2) how it can be mitigated. For step 1, we extend prior work by investigating how the relation between response and evidence affects citation quality. We introduce CITECONTROL, a benchmark that systematically varies this relation to analyze failure modes. Experiments show that failures increase with relational complexity and suggest that combining citation methods could improve performance, motivating step 2. To improve LLM citation efficiently, we propose CITENTION, a framework integrating generative, attention-based, and retrieval-based methods. Results demonstrate substantial citation improvements on CITECONTROL and in transfer settings. We make our data and code publicly available.
Related papers
- CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era [51.63024682584688]
Large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications.<n>We present the first comprehensive benchmark and detection framework for hallucinated citations in scientific writing.<n>Our framework significantly outperforms prior methods in both accuracy and interpretability.
arXiv Detail & Related papers (2026-02-26T19:17:39Z) - FineRef: Fine-Grained Error Reflection and Correction for Long-Form Generation with Citations [30.28908306106096]
FineRef teaches the model to self-identify and correct two key citation errors, mismatch and irrelevance, on a per-citation basis.<n>FineRef significantly improves both citation performance and answer accuracy.<n>Our 7B model outperforms GPT-4 by up to 18% in Citation F1 and 4% in EM Recall.
arXiv Detail & Related papers (2025-11-18T09:35:12Z) - VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z) - Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution [8.691344810384114]
Large Language Models (LLMs) must cite human-verifiable sources in high-stakes domains such as healthcare, law, academia, and finance.<n>We introduce two paradigms: Generation-Time Citation (G-Cite) which produces the answer and citations in one pass, and Post-hoc Citation (P-Cite) which adds or verifies citations after drafting.<n>Our results show a consistent trade-off between coverage and citation correctness, with retrieval as the main driver of attribution quality in both paradigms.
arXiv Detail & Related papers (2025-09-25T20:39:26Z) - SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models [51.90867482317985]
SelfCite is a self-supervised approach to generate fine-grained, sentence-level citations for statements in generated responses.<n>The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark.
arXiv Detail & Related papers (2025-02-13T18:55:13Z) - On the Capacity of Citation Generation by Large Language Models [38.47160164251295]
Retrieval-augmented generation (RAG) appears as a promising method to alleviate the "hallucination" problem in large language models (LLMs)
arXiv Detail & Related papers (2024-10-15T03:04:26Z) - Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
Attributed Text Generation (ATG) is proposed to enhance credibility and verifiability in RAG systems.<n>This paper proposes ReClaim, a fine-grained ATG method that alternates the generation of references and answers step by step.<n>With extensive experiments, we verify the effectiveness of ReClaim in extensive settings, achieving a citation accuracy rate of 90%.
arXiv Detail & Related papers (2024-07-01T20:47:47Z) - ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - Learning to Generate Answers with Citations via Factual Consistency Models [28.716998866121923]
Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations.
This paper proposes a weakly-supervised fine-tuning method leveraging factual consistency models (FCMs)
Focused learning is integrated into the objective, directing the fine-tuning process to emphasise the factual unit tokens.
arXiv Detail & Related papers (2024-06-19T00:40:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.