A Survey on Retrieval And Structuring Augmented Generation with Large Language Models
- URL: http://arxiv.org/abs/2509.10697v1
- Date: Fri, 12 Sep 2025 21:25:25 GMT
- Title: A Survey on Retrieval And Structuring Augmented Generation with Large Language Models
- Authors: Pengcheng Jiang, Siru Ouyang, Yizhu Jiao, Ming Zhong, Runchu Tian, Jiawei Han,
- Abstract summary: Large Language Models (LLMs) have revolutionized natural language processing with their remarkable capabilities in text generation and reasoning.<n>However, these models face critical challenges when deployed in real-world applications, including outdated knowledge, and limited domain expertise.<n>Retrieval And Structuring (RAS) Augmented Generation addresses these limitations by integrating dynamic information retrieval with structured knowledge representations.
- Score: 29.707181003761004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have revolutionized natural language processing with their remarkable capabilities in text generation and reasoning. However, these models face critical challenges when deployed in real-world applications, including hallucination generation, outdated knowledge, and limited domain expertise. Retrieval And Structuring (RAS) Augmented Generation addresses these limitations by integrating dynamic information retrieval with structured knowledge representations. This survey (1) examines retrieval mechanisms including sparse, dense, and hybrid approaches for accessing external knowledge; (2) explore text structuring techniques such as taxonomy construction, hierarchical classification, and information extraction that transform unstructured text into organized representations; and (3) investigate how these structured representations integrate with LLMs through prompt-based methods, reasoning frameworks, and knowledge embedding techniques. It also identifies technical challenges in retrieval efficiency, structure quality, and knowledge integration, while highlighting research opportunities in multimodal retrieval, cross-lingual structures, and interactive systems. This comprehensive overview provides researchers and practitioners with insights into RAS methods, applications, and future directions.
Related papers
- Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach [33.17711262799183]
We develop a task-agnostic structured knowledge hunter for knowledge-enhanced text generation tasks.<n>Our model achieves high interpretability, enabling users to comprehend the model output generation process.<n>We empirically demonstrate the effectiveness of our model in both internal knowledge-enhanced table-to-text generation on the RotoWireFG dataset and external knowledge-enhanced dialogue response generation on the KdConv dataset.
arXiv Detail & Related papers (2025-11-28T16:43:46Z) - Deep Research: A Systematic Survey [118.82795024422722]
Deep Research (DR) aims to combine the reasoning capabilities of large language models with external tools, such as search engines.<n>This survey presents a comprehensive and systematic overview of deep research systems.
arXiv Detail & Related papers (2025-11-24T15:28:28Z) - A Survey of Context Engineering for Large Language Models [31.68644305980195]
This survey introduces Context Engineering, a formal discipline that transcends simple prompt design.<n>We first examine the foundational components: context retrieval and generation, context processing and context management.<n>We then explore how these components are architecturally integrated to create sophisticated system implementations.
arXiv Detail & Related papers (2025-07-17T17:50:36Z) - STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking [2.355572228890207]
StructSense is a modular, task-agnostic, open-source framework for structured information extraction built on Large Language Models.<n>It is guided by domain-specific symbolic knowledge enabling it encoded complex domain content effectively.<n>We demonstrate that StructSense can overcome both the limitations of domain sensitivity and the lack of cross-task generalizability.
arXiv Detail & Related papers (2025-07-04T15:51:07Z) - RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation [46.237206695937246]
We propose Retrieval-And-Structuring (RAS), a framework that dynamically constructs query-specific knowledge graphs.<n>On seven knowledge-intensive benchmarks, RAS consistently outperforms strong baselines.<n>Our results demonstrate that dynamic, query-specific knowledge structuring offers a robust path to improving reasoning accuracy and robustness in language model generation.
arXiv Detail & Related papers (2025-02-16T05:01:49Z) - A Comprehensive Survey on Integrating Large Language Models with Knowledge-Based Methods [4.686190098233778]
Large Language Models (LLMs) can be integrated with structured knowledge-based systems.<n>This article surveys the relationship between LLMs and knowledge bases, looks at how they can be applied in practice, and discusses related technical, operational, and ethical challenges.<n>It demonstrates the merits of incorporating generative AI into structured knowledge-base systems concerning data contextualization, model accuracy, and utilization of knowledge resources.
arXiv Detail & Related papers (2025-01-19T23:25:21Z) - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.