Is Implicit Knowledge Enough for LLMs? A RAG Approach for Tree-based Structures
- URL: http://arxiv.org/abs/2510.10806v2
- Date: Tue, 21 Oct 2025 16:10:00 GMT
- Title: Is Implicit Knowledge Enough for LLMs? A RAG Approach for Tree-based Structures
- Authors: Mihir Gupte, Paolo Giusto, Ramesh S,
- Abstract summary: Large Language Models (LLMs) are adept at generating responses based on information within their context.<n>Retrieval-Augmented Generation (RAG) retrieves relevant documents to augment the model's in-context learning.<n>We propose a novel method to linearize knowledge from tree-like structures by generating implicit, aggregated summaries at each hierarchical level.
- Score: 0.5352699766206808
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) are adept at generating responses based on information within their context. While this ability is useful for interacting with structured data like code files, another popular method, Retrieval-Augmented Generation (RAG), retrieves relevant documents to augment the model's in-context learning. However, it is not well-explored how to best represent this retrieved knowledge for generating responses on structured data, particularly hierarchical structures like trees. In this work, we propose a novel bottom-up method to linearize knowledge from tree-like structures (like a GitHub repository) by generating implicit, aggregated summaries at each hierarchical level. This approach enables the knowledge to be stored in a knowledge base and used directly with RAG. We then compare our method to using RAG on raw, unstructured code, evaluating the accuracy and quality of the generated responses. Our results show that while response quality is comparable across both methods, our approach generates over 68% fewer documents in the retriever, a significant gain in efficiency. This finding suggests that leveraging implicit, linearized knowledge may be a highly effective and scalable strategy for handling complex, hierarchical data structures.
Related papers
- Structured Knowledge Representation through Contextual Pages for Retrieval-Augmented Generation [53.768256130061765]
PAGER is a page-driven autonomous knowledge representation framework for RAG.<n>It iteratively retrieves and refines relevant documents to populate each slot, ultimately constructing a coherent page.<n>Experiments show that PAGER consistently outperforms all RAG baselines.
arXiv Detail & Related papers (2026-01-14T11:44:31Z) - BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents [11.158307125677375]
Retrieval-Augmented Generation (RAG) queries highly relevant information from external complex documents.<n>We introduce BookRAG, a novel RAG approach targeted for documents with a hierarchical structure.<n>BookRAG achieves state-of-the-art performance, significantly outperforming baselines in both retrieval recall and QA accuracy.
arXiv Detail & Related papers (2025-12-03T03:40:49Z) - Scheduling Your LLM Reinforcement Learning with Reasoning Trees [18.720191133993715]
We introduce Reasoning Score (r-score), which measures the query's learning difficulty based on the structure of its reasoning tree.<n>Based on the r-score, we propose the Reasoning Tree Schedule (Re-Schedule), a scheduling algorithm that constructs a curriculum progressing from structurally simple (high r-score) to complex (low r-score) queries.
arXiv Detail & Related papers (2025-10-28T17:52:07Z) - Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models [89.65955788873532]
Open-domain question answering (OpenQA) represents a cornerstone in natural language processing (NLP)<n>We propose a novel framework named GenKI, which aims to improve the OpenQA performance by exploring Knowledge Integration and controllable Generation.
arXiv Detail & Related papers (2025-05-26T08:18:33Z) - ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation [16.204046295248546]
Retrieval-Augmented Generation (RAG) has proven effective in integrating external knowledge into large language models.<n>We introduce a novel graph-based RAG approach, called Attributed Community-based Hierarchical RAG (ArchRAG)<n>We build a novel hierarchical index structure for the attributed communities and develop an effective online retrieval method.
arXiv Detail & Related papers (2025-02-14T03:28:36Z) - ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.<n>Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.<n>Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers [0.0]
Retrieval-Augmented Generation (RAG) is a prevalent approach to infuse a private knowledge base of documents with Large Language Models (LLM) to build Generative Q&A (Question-Answering) systems.
We propose the 'Blended RAG' method of leveraging semantic search techniques, such as Vector indexes and Sparse indexes, blended with hybrid query strategies.
Our study achieves better retrieval results and sets new benchmarks for IR (Information Retrieval) datasets like NQ and TREC-COVID datasets.
arXiv Detail & Related papers (2024-03-22T17:13:46Z) - Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue
System [40.33178881317882]
We propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision.
We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models.
arXiv Detail & Related papers (2023-10-13T06:03:47Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.