SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature
- URL: http://arxiv.org/abs/2511.14362v1
- Date: Tue, 18 Nov 2025 11:09:19 GMT
- Title: SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature
- Authors: Hang Ding, Yilun Zhao, Tiansheng Hu, Manasi Patwardhan, Arman Cohan,
- Abstract summary: We introduce SciRAG, an open-source framework for scientific literature exploration.<n>We introduce three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution.
- Score: 52.36039386997026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The accelerating growth of scientific publications has intensified the need for scalable, trustworthy systems to synthesize knowledge across diverse literature. While recent retrieval-augmented generation (RAG) methods have improved access to scientific information, they often overlook citation graph structure, adapt poorly to complex queries, and yield fragmented, hard-to-verify syntheses. We introduce SciRAG, an open-source framework for scientific literature exploration that addresses these gaps through three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter supporting documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution. Extensive experiments across multiple benchmarks such as QASA and ScholarQA demonstrate that SciRAG outperforms prior systems in factual accuracy and synthesis quality, establishing a new foundation for reliable, large-scale scientific knowledge aggregation.
Related papers
- VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z) - SciGPT: A Large Language Model for Scientific Literature Understanding and Knowledge Discovery [3.779883844533933]
This paper presents SciGPT, a domain-adapted model for scientific literature understanding and ScienceBench, an open source benchmark tailored to evaluate scientific LLMs.<n>Built on the Qwen3 architecture, SciGPT incorporates three key innovations: (1) low-cost domain distillation via a two-stage pipeline to balance performance and efficiency; (2) a Sparse Mixture-of-Experts attention mechanism that cuts memory consumption by 55% for 32,000 long-token reasoning; and (3) knowledge-aware adaptation integrating domain-specific nuances.<n> Experimental results on ScienceBench show that SciGPT outperforms GPT-4o in core scientific tasks including sequence
arXiv Detail & Related papers (2025-09-09T16:09:19Z) - SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM [19.949137890090814]
We propose an advanced topic discovery method enhanced by large language models (LLMs) to improve scientific topic identification.<n> Specifically, we build a textual encoder to capture the content from scientific publications, including metadata, title, and abstract.<n>We then construct a space optimization module that integrates entropy-based sampling and triplet tasks guided by LLMs.<n>Experiments conducted on three real-world datasets demonstrate that SciTopic outperforms the state-of-the-art (SOTA) scientific topic discovery methods.
arXiv Detail & Related papers (2025-08-28T07:55:06Z) - HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z) - TrustRAG: An Information Assistant with Retrieval Augmented Generation [73.84864898280719]
TrustRAG is a novel framework that enhances acRAG from three perspectives: indexing, retrieval, and generation.<n>We open-source the TrustRAG framework and provide a demonstration studio designed for excerpt-based question answering tasks.
arXiv Detail & Related papers (2025-02-19T13:45:27Z) - CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs [9.718354494802002]
Contextualized Graph Retrieval-Augmented Generation (CG-RAG) is a novel framework that integrates sparse and dense retrieval signals within graph structures.<n>First, we propose a contextual graph representation for citation graphs, effectively capturing both explicit and implicit connections within and across documents.<n>Second, we introduce Lexical-Semantic Graph Retrieval (LeSeGR), which seamlessly integrates sparse and dense retrieval signals with graph encoding.<n>Third, we present a context-aware generation strategy that utilizes the retrieved graph-structured information to generate precise and contextually enriched responses.
arXiv Detail & Related papers (2025-01-25T04:18:08Z) - LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis [0.16385815610837165]
This paper introduces the LLMs4Synthesis framework, designed to enhance the capabilities of Large Language Models (LLMs) in generating high-quality scientific syntheses.
It addresses the need for rapid, coherent, and contextually rich integration of scientific insights, leveraging both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-09-27T15:04:39Z) - SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding [22.131371019641417]
Despite Large Language Models' success, they face challenges in scientific literature understanding.<n>We propose a hybrid strategy that integrates continual pre-training (CPT) and supervised fine-tuning (SFT)<n>We present a suite of LLMs: SciLitLLM, specialized in scientific literature understanding.
arXiv Detail & Related papers (2024-08-28T05:41:52Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.