Related papers: What Should I Cite? A RAG Benchmark for Academic Citation Prediction

What Should I Cite? A RAG Benchmark for Academic Citation Prediction

URL: http://arxiv.org/abs/2601.14949v2
Date: Mon, 26 Jan 2026 07:09:08 GMT
Title: What Should I Cite? A RAG Benchmark for Academic Citation Prediction
Authors: Leqi Zheng, Jiajun Zhang, Canzhi Chen, Chaokun Wang, Hongwei Li, Yuying Li, Yaoxin Mao, Shannan Yan, Zixin Song, Zhiyuan Feng, Zhaolu Kang, Zirong Chen, Hang Zhang, Qiang Liu, Liang Wang, Ziyang Liu,
Abstract summary: Citation prediction aims to automatically suggest appropriate references, helping scholars navigate the expanding scientific literature.<n>Here we present textbfCiteRAG, the first comprehensive retrieval-augmented generation (RAG)-integrated benchmark for evaluating large language models on academic citation prediction.
Score: 24.99107629089983
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid growth of Web-based academic publications, more and more papers are being published annually, making it increasingly difficult to find relevant prior work. Citation prediction aims to automatically suggest appropriate references, helping scholars navigate the expanding scientific literature. Here we present \textbf{CiteRAG}, the first comprehensive retrieval-augmented generation (RAG)-integrated benchmark for evaluating large language models on academic citation prediction, featuring a multi-level retrieval strategy, specialized retrievers, and generators. Our benchmark makes four core contributions: (1) We establish two instances of the citation prediction task with different granularity. Task 1 focuses on coarse-grained list-specific citation prediction, while Task 2 targets fine-grained position-specific citation prediction. To enhance these two tasks, we build a dataset containing 7,267 instances for Task 1 and 8,541 instances for Task 2, enabling comprehensive evaluation of both retrieval and generation. (2) We construct a three-level large-scale corpus with 554k papers spanning many major subfields, using an incremental pipeline. (3) We propose a multi-level hybrid RAG approach for citation prediction, fine-tuning embedding models with contrastive learning to capture complex citation relationships, paired with specialized generation models. (4) We conduct extensive experiments across state-of-the-art language models, including closed-source APIs, open-source models, and our fine-tuned generators, demonstrating the effectiveness of our framework. Our open-source toolkit enables reproducible evaluation and focuses on academic literature, providing the first comprehensive evaluation framework for citation prediction and serving as a methodological template for other scientific domains. Our source code and data are released at https://github.com/LQgdwind/CiteRAG.

Related papers

Talk2Ref: A Dataset for Reference Prediction from Scientific Talks [15.95945265244193]
We introduce Reference Prediction from Talks (RPT), a new task that maps long, and unstructured scientific presentations to relevant papers.<n>We present Talk2Ref, the first large-scale dataset of its kind, containing 6,279 talks and 43,429 cited papers.<n>Our results show that fine-tuning on Talk2Ref significantly improves citation prediction performance.
arXiv Detail & Related papers (2025-10-28T14:50:03Z)
Hierarchical Lexical Graph for Enhanced Multi-Hop Retrieval [22.33550491040999]
RAG grounds large language models in external evidence, yet it still falters when answers must be pieced together across semantically distant documents.<n>We build two plug-and-play retrievers: StatementGraphRAG and TopicGraphRAG.<n>Our methods outperform naive chunk-based RAG achieving an average relative improvement of 23.1% in retrieval recall and correctness.
arXiv Detail & Related papers (2025-06-09T17:58:35Z)
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations [45.57178343138677]
We introduce ScholarCopilot, a unified framework designed to enhance existing large language models for academic writing.<n> ScholarCopilot determines when to retrieve scholarly references by generating a retrieval token [RET], which is then used to query a citation database.<n>We jointly optimize both the generation and citation tasks within a single framework to improve efficiency.
arXiv Detail & Related papers (2025-04-01T14:12:14Z)
TrustRAG: An Information Assistant with Retrieval Augmented Generation [73.84864898280719]
TrustRAG is a novel framework that enhances acRAG from three perspectives: indexing, retrieval, and generation.<n>We open-source the TrustRAG framework and provide a demonstration studio designed for excerpt-based question answering tasks.
arXiv Detail & Related papers (2025-02-19T13:45:27Z)
Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models [0.0]
We propose a Bidirectional Long Short-Term Memory (BiLSTM) network with attention mechanism and contextual information to detect sentences that need citations. We produce a new, large dataset (PMOA-CITE) based on PubMed Open Access Subset, which is orders of magnitude larger than previous datasets.
arXiv Detail & Related papers (2024-05-20T17:45:36Z)
Context-Enhanced Language Models for Generating Multi-Paper Citations [35.80247519023821]
We propose a method that leverages Large Language Models (LLMs) to generate multi-citation sentences. Our approach involves a single source paper and a collection of target papers, culminating in a coherent paragraph containing multi-sentence citation text.
arXiv Detail & Related papers (2024-04-22T04:30:36Z)
CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks [20.390672895839757]
Retrieval-augmented generation (RAG) has emerged as a popular solution to enhance factual accuracy. Traditional retrieval modules often rely on large document index and disconnect with generative tasks. We propose textbfCorpusLM, a unified language model that integrates generative retrieval, closed-book generation, and RAG.
arXiv Detail & Related papers (2024-02-02T06:44:22Z)
CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation. We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z)
Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z)
Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph. We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains. Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)
What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work. We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels. We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z)
Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks. We first represent both natural language query texts and programming language code snippets with the unified graph-structured data. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.