PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers
- URL: http://arxiv.org/abs/2602.13647v1
- Date: Sat, 14 Feb 2026 07:40:09 GMT
- Title: PT-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Papers
- Authors: Rui Yu, Tianyi Wang, Ruixia Liu, Yinglong Wang,
- Abstract summary: We propose PT-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior.<n>It first inherits the native hierarchy to construct a structure-fidelity PaperTree index, which prevents entropy increase at the source.<n>It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget.
- Score: 8.665160048483655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-augmented generation (RAG) is increasingly applied to question-answering over long academic papers, where accurate evidence allocation under a fixed token budget is critical. Existing approaches typically flatten academic papers into unstructured chunks during preprocessing, which destroys the native hierarchical structure. This loss forces retrieval to operate in a disordered space, thereby producing fragmented contexts, misallocating tokens to non-evidential regions under finite token budgets, and increasing the reasoning burden for downstream language models. To address these issues, we propose PT-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior. PT-RAG first inherits the native hierarchy to construct a structure-fidelity PaperTree index, which prevents entropy increase at the source. It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget, yielding compact, coherent, and low-entropy retrieval contexts. In contrast to existing RAG approaches, PT-RAG avoids entropy increase caused by destructive preprocessing and provides a native low-entropy structural basis for subsequent retrieval. To assess this design, we introduce entropy-based structural diagnostics that quantify retrieval fragmentation and evidence allocation accuracy. On three academic question-answering benchmarks, PT-RAG achieves consistently lower section entropy and evidence alignment cross entropy than strong baselines, indicating reduced context fragmentation and more precise allocation to evidential regions. These structural advantages directly translate into higher answer quality.
Related papers
- On Multi-Step Theorem Prediction via Non-Parametric Structural Priors [50.16583672681106]
In this work, we explore training-free theorem prediction through the lens of in-context learning (ICL)<n>We propose Theorem Precedence Graphs, which encode temporal dependencies from historical solution traces as directed graphs, and impose explicit topological constraints that effectively prune the search space during inference.<n>Experiments on the FormalGeo7k benchmark show that our method achieves 89.29% accuracy, substantially outperforming ICL baselines and matching state-of-the-art supervised models.
arXiv Detail & Related papers (2026-03-05T06:08:50Z) - PruneRAG: Confidence-Guided Query Decomposition Trees for Efficient Retrieval-Augmented Generation [19.832367438725306]
PruneRAG builds a structured query decomposition tree to perform stable and efficient reasoning.<n>We define the Evidence Forgetting Rate as a metric to quantify cases where golden evidence is retrieved but not correctly used.
arXiv Detail & Related papers (2026-01-16T06:38:17Z) - Disco-RAG: Discourse-Aware Retrieval-Augmented Generation [81.53888908988756]
We propose Disco-RAG, a discourse-aware framework that injects discourse signals into the generation process.<n>Our method constructs intra-chunk discourse trees to capture local hierarchies and builds inter-chunk rhetorical graphs to model cross-passage coherence.<n>Experiments on question answering and long-document summarization benchmarks show the efficacy of our approach.
arXiv Detail & Related papers (2026-01-07T20:32:50Z) - Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey [92.71325249013535]
Deliberative tree search is a cornerstone of Large Language Model (LLM) research.<n>This paper introduces a unified framework that deconstructs search algorithms into three core components.
arXiv Detail & Related papers (2025-10-11T03:29:18Z) - LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval [10.566901995776025]
LeanRAG is a framework that combines knowledge aggregation and retrieval strategies.<n>It can mitigate the substantial overhead associated with path retrieval on graphs and minimizes redundant information retrieval.
arXiv Detail & Related papers (2025-08-14T06:47:18Z) - You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures [16.867592142212203]
Large language models (LLMs) often suffer from hallucination, generating factually incorrect statements when handling questions beyond their knowledge.<n>Retrieval-augmented generation (RAG) addresses this by retrieving query-relevant contexts from knowledge bases to support LLM reasoning.<n>Existing Graph-based RAG methods rely on a costly process to transform the corpus into a graph, introducing overwhelming token cost and update latency.<n>We propose LogicRAG that dynamically extracts reasoning structures at inference time to guide adaptive retrieval without any pre-built graph.
arXiv Detail & Related papers (2025-08-08T08:07:40Z) - Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graphs for Retrieval-Augmented Generation [69.45495166424642]
We develop a robust and discriminative QA benchmark to measure temporal, causal, and character consistency understanding in narrative documents.<n>We then introduce Entity-Event RAG (E2RAG), a dual-graph framework that keeps separate entity and event subgraphs linked by a bipartite mapping.<n>Across ChronoQA, our approach outperforms state-of-the-art unstructured and KG-based RAG baselines, with notable gains on causal and character consistency queries.
arXiv Detail & Related papers (2025-06-06T10:07:21Z) - An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach [0.0]
RAG systems in the legal domain face a critical challenge: standard, flat-text retrieval is blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers.<n>This paper introduces the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework designed to overcome these limitations by explicitly modeling the formal structure and diachronic nature of legal norms.
arXiv Detail & Related papers (2025-04-29T18:36:57Z) - Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization [58.390885294401066]
Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>We propose AlignRAG, a novel iterative framework grounded in Critique-Driven Alignment (CDA)<n>We introduce AlignRAG-auto, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations.
arXiv Detail & Related papers (2025-04-21T04:56:47Z) - TopoFR: A Closer Look at Topology Alignment on Face Recognition [58.45515807380505]
We propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE.<n> PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model.<n> Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-14T14:58:30Z) - Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic [51.967603572656266]
We introduce a consistent and theoretically grounded approach to annotating decompositional entailment.
We find that our new dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets.
We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality.
arXiv Detail & Related papers (2024-02-22T18:55:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.