Related papers: Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy

Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy

URL: http://arxiv.org/abs/2510.25378v1
Date: Wed, 29 Oct 2025 10:51:35 GMT
Title: Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy
Authors: Junichiro Niimi,
Abstract summary: Large hallucination models (LLMs) have been increasingly applied to a wide range of tasks.<n>This study hypothesizes that an LLM's ability to correctly produce information depends on whether the underlying knowledge is generated or memorized.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in bibliographic recommendation, the hallucination of non-existent papers remains a major issue. Building on prior studies, this study hypothesizes that an LLM's ability to correctly produce bibliographic information depends on whether the underlying knowledge is generated or memorized, with highly cited papers (i.e., more frequently appear in the training corpus) showing lower hallucination rates. We therefore assume citation count as a proxy for training data redundancy (i.e., the frequency with which a given bibliographic record is repeatedly represented in the pretraining corpus) and investigate how citation frequency affects hallucinated references in LLM outputs. Using GPT-4.1, we generated and manually verified 100 bibliographic records across twenty computer-science domains, and measured factual consistency via cosine similarity between generated and authentic metadata. The results revealed that (i) hallucination rates vary across research domains, (ii) citation count is strongly correlated with factual accuracy, and (iii) bibliographic information becomes almost verbatimly memorized beyond approximately 1,000 citations. These findings suggest that highly cited papers are nearly verbatimly retained in the model, indicating a threshold where generalization shifts into memorization.

Related papers

CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era [51.63024682584688]
Large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications.<n>We present the first comprehensive benchmark and detection framework for hallucinated citations in scientific writing.<n>Our framework significantly outperforms prior methods in both accuracy and interpretability.
arXiv Detail & Related papers (2026-02-26T19:17:39Z)
Hallucinate or Memorize? The Two Sides of Probabilistic Learning in Large Language Models [0.0]
Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation.<n>While they have also been used to assist in citation recommendation, the hallucination of non-existent papers remains a major issue.<n>This study hypothesizes that an LLM's ability to correctly produce records depends on whether the underlying knowledge is generated or memorized.
arXiv Detail & Related papers (2025-11-12T01:29:12Z)
VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models [44.31597857713689]
We introduce Active Indexing for the first stage, which creates generalizable, source-anchored bindings.<n>Experiments with Qwen-2.5-7B&3B show that Active Indexing consistently outperforms a Passive Indexing baseline.<n> internal citations complement external ones by making the model more robust to retrieval noise.
arXiv Detail & Related papers (2025-06-21T04:48:05Z)
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? [1.130790932059036]
We show that large language models (LLMs) reinforce the Matthew effect in citations by consistently favoring highly cited papers.<n>We analyze 274,951 references generated by GPT-4o for 10,000 papers.
arXiv Detail & Related papers (2025-04-03T17:04:56Z)
The Noisy Path from Source to Citation: Measuring How Scholars Engage with Past Research [20.649638393774048]
We introduce a computational pipeline to quantify citation fidelity at scale.<n>Using full texts of papers, the pipeline identifies citations in citing papers and the corresponding claims in cited papers.<n>Using a quasi-experiment, we establish the "telephone effect" - when citing papers have low fidelity to the original claim, future papers that cite the citing paper and the original have lower fidelity to the original.
arXiv Detail & Related papers (2025-02-27T22:47:03Z)
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
Attributed Text Generation (ATG) is proposed to enhance credibility and verifiability in RAG systems.<n>This paper proposes ReClaim, a fine-grained ATG method that alternates the generation of references and answers step by step.<n>With extensive experiments, we verify the effectiveness of ReClaim in extensive settings, achieving a citation accuracy rate of 90%.
arXiv Detail & Related papers (2024-07-01T20:47:47Z)
ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation. Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level. We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias [1.7812428873698407]
Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) introduces a new dynamic to these practices. Here, we analyze these characteristics in an experiment using a dataset from AAAI, NeurIPS, ICML, and ICLR.
arXiv Detail & Related papers (2024-05-24T17:34:32Z)
Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks. Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z)
Counterfactual Memorization in Neural Language Models [91.8747020391287]
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. An open question in previous studies of language model memorization is how to filter out "common" memorization. We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
arXiv Detail & Related papers (2021-12-24T04:20:57Z)
Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph. We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains. Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.