Aligning Large Language Model Behavior with Human Citation Preferences
- URL: http://arxiv.org/abs/2602.05205v1
- Date: Thu, 05 Feb 2026 02:02:43 GMT
- Title: Aligning Large Language Model Behavior with Human Citation Preferences
- Authors: Kenichiro Ando, Tatsuya Harada,
- Abstract summary: We study the relationship between human citation preferences and large-scale language models (LLMs) behavior.<n>Our results show that humans most frequently seek citations for medical text, and stronger models display a similar tendency.
- Score: 45.80355133880463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most services built on powerful large-scale language models (LLMs) add citations to their output to enhance credibility. Recent research has paid increasing attention to the question of what reference documents to link to outputs. However, how LLMs recognize cite-worthiness and how this process should be controlled remains underexplored. In this study, we focus on what kinds of content LLMs currently tend to cite and how well that behavior aligns with human preferences. We construct a dataset to characterize the relationship between human citation preferences and LLM behavior. Web-derived texts are categorized into eight citation-motivation types, and pairwise citation preferences are exhaustively evaluated across all type combinations to capture fine-grained contrasts. Our results show that humans most frequently seek citations for medical text, and stronger models display a similar tendency. We also find that current models are as much as $27\%$ more likely than humans to add citations to text that is explicitly marked as needing citations on sources such as Wikipedia, and this overemphasis reduces alignment accuracy. Conversely, models systematically underselect numeric sentences (by $-22.6\%$ relative to humans) and sentences containing personal names (by $-20.1\%$), categories for which humans typically demand citations. Furthermore, experiments with Direct Preference Optimization demonstrate that model behavior can be calibrated to better match human citation preferences. We expect this study to provide a foundation for more fine-grained investigations into LLM citation preferences.
Related papers
- CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era [51.63024682584688]
Large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications.<n>We present the first comprehensive benchmark and detection framework for hallucinated citations in scientific writing.<n>Our framework significantly outperforms prior methods in both accuracy and interpretability.
arXiv Detail & Related papers (2026-02-26T19:17:39Z) - Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution [8.691344810384114]
Large Language Models (LLMs) must cite human-verifiable sources in high-stakes domains such as healthcare, law, academia, and finance.<n>We introduce two paradigms: Generation-Time Citation (G-Cite) which produces the answer and citations in one pass, and Post-hoc Citation (P-Cite) which adds or verifies citations after drafting.<n>Our results show a consistent trade-off between coverage and citation correctness, with retrieval as the main driver of attribution quality in both paradigms.
arXiv Detail & Related papers (2025-09-25T20:39:26Z) - SCIRGC: Multi-Granularity Citation Recommendation and Citation Sentence Preference Alignment [2.0383262889621867]
We propose the SciRGC framework, which aims to automatically recommend citation articles and generate citation sentences for citation locations within articles.<n>The framework addresses two key challenges in academic citation generation: 1) how to accurately identify the author's citation intent and find relevant citation papers, and 2) how to generate high-quality citation sentences that align with human preferences.
arXiv Detail & Related papers (2025-05-26T15:09:10Z) - SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models [51.90867482317985]
SelfCite is a self-supervised approach to generate fine-grained, sentence-level citations for statements in generated responses.<n>The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark.
arXiv Detail & Related papers (2025-02-13T18:55:13Z) - HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction [14.731720495144112]
We introduce the novel concept of core citation, which identifies the critical references that go beyond superficial mentions.
We propose $textbfHLM-Cite, a $textbfH$ybrid $textbfL$anguage $textbfM$odel workflow for citation prediction.
We evaluate HLM-Cite across 19 scientific fields, demonstrating a 17.6% performance improvement comparing SOTA methods.
arXiv Detail & Related papers (2024-10-10T10:46:06Z) - ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - Learning to Generate Answers with Citations via Factual Consistency Models [28.716998866121923]
Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations.
This paper proposes a weakly-supervised fine-tuning method leveraging factual consistency models (FCMs)
Focused learning is integrated into the objective, directing the fine-tuning process to emphasise the factual unit tokens.
arXiv Detail & Related papers (2024-06-19T00:40:19Z) - Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias [1.7812428873698407]
Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases.
The emergence of Large Language Models (LLMs) introduces a new dynamic to these practices.
Here, we analyze these characteristics in an experiment using a dataset from AAAI, NeurIPS, ICML, and ICLR.
arXiv Detail & Related papers (2024-05-24T17:34:32Z) - Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data [48.409306245463]
We develop models that quote verbatim statements from trusted sources in their pre-training data.<n>The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora.<n> Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models.
arXiv Detail & Related papers (2024-04-05T02:27:09Z) - Dissecting Human and LLM Preferences [80.55271307662365]
We find that humans are less sensitive to errors, favor responses that support their stances, and show clear dislike when models admit their limits.
advanced LLMs like GPT-4-Turbo emphasize correctness, clarity, and harmlessness more.
We show that preference-based evaluation can be intentionally manipulated.
arXiv Detail & Related papers (2024-02-17T14:34:31Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.