Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems
- URL: http://arxiv.org/abs/2601.10681v1
- Date: Thu, 15 Jan 2026 18:43:19 GMT
- Title: Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems
- Authors: Amir Khurshid, Abhishek Sehgal,
- Abstract summary: Large language model (LLM) contexts are typically constructed using retrieval-augmented generation (RAG)<n>This paper proposes a structure-informed and diversity-constrained context bubble construction framework.
- Score: 0.7734726150561088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) contexts are typically constructed using retrieval-augmented generation (RAG), which involves ranking and selecting the top-k passages. The approach causes fragmentation in information graphs in document structures, over-retrieval, and duplication of content alongside insufficient query context, including 2nd and 3rd order facets. In this paper, a structure-informed and diversity-constrained context bubble construction framework is proposed that assembles coherent, citable bundles of spans under a strict token budget. The method preserves and exploits inherent document structure by organising multi-granular spans (e.g., sections and rows) and using task-conditioned structural priors to guide retrieval. Starting from high-relevance anchor spans, a context bubble is constructed through constrained selection that balances query relevance, marginal coverage, and redundancy penalties. It will explicitly constrain diversity and budget, producing compact and informative context sets, unlike top-k retrieval. Moreover, a full retrieval is emitted that traces the scoring and selection choices of the records, thus providing auditability and deterministic tuning. Experiments on enterprise documents demonstrate the efficiency of context bubble as it significantly reduces redundant context, is better able to cover secondary facets and has a better answer quality and citation faithfulness within a limited context window. Ablation studies demonstrate that both structural priors as well as diversity constraint selection are necessary; removing either component results in a decline in coverage and an increase in redundant or incomplete context.
Related papers
- DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing [53.85037373860246]
We introduce Deep Synth-Eval, a benchmark designed to objectively evaluate information consolidation capabilities.<n>We propose a fine-grained evaluation protocol using General Checklists (for factual coverage) and Constraint Checklists (for structural organization)<n>Our results demonstrate that agentic plan-and-write significantly outperform single-turn generation.
arXiv Detail & Related papers (2026-01-07T03:07:52Z) - From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition [46.36937947958481]
We introduce a novel explicit compression framework designed to preserve both global structure and fine-grained details.<n>Our approach reformulates a structural context compression as a structure-then-select process.<n>Our method achieves state-of-the-art structural prediction accuracy and significantly outperforms frontier LLMs.
arXiv Detail & Related papers (2025-12-16T09:52:58Z) - BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation [26.825801831400003]
BoundRL performs token-level text segmentation and label prediction for long structured texts.<n>Instead of generating complete contents for each segment, it generates only a sequence of starting tokens.<n>It reconstructs the complete contents by locating these tokens within the original texts.
arXiv Detail & Related papers (2025-10-23T02:56:10Z) - Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce [61.03081096959132]
We propose a context-aware reasoning-enhanced generative search framework for better textbfunderstanding the complicated context.<n>Our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
arXiv Detail & Related papers (2025-10-19T16:46:11Z) - Structure-R1: Dynamically Leveraging Structural Knowledge in LLM Reasoning through Reinforcement Learning [29.722512436773638]
We propose textscStructure-R1, a framework that transforms retrieved content into structured representations optimized for reasoning.<n>We show that textscStructure-R1 consistently achieves competitive performance with a 7B-scale backbone model.<n>Our theoretical analysis demonstrates how structured representations enhance reasoning by improving information density and contextual clarity.
arXiv Detail & Related papers (2025-10-16T23:19:28Z) - Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation [57.97548022208733]
We show that seemingly superficial choices in key-value extraction can induce shifts in accuracy and stability.<n>We introduce Contextual Normalization, a strategy that adaptively standardizes context representations before generation.
arXiv Detail & Related papers (2025-10-15T06:28:25Z) - Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering [51.7493726399073]
We present a discourse-aware hierarchical framework to enhance long document question answering.<n>The framework involves three key innovations: specialized discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval.
arXiv Detail & Related papers (2025-05-26T14:45:12Z) - CORG: Generating Answers from Complex, Interrelated Contexts [57.213304718157985]
In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors.<n>Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation.<n>We introduce Context Organizer (CORG), a framework that organizes multiple contexts into independently processed groups.
arXiv Detail & Related papers (2025-04-25T02:40:48Z) - Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases [78.62158923194153]
Text-rich Graph Knowledge Bases (TG-KBs) have become increasingly crucial for answering queries by providing textual and structural knowledge.<n>We propose a Mixture of Structural-and-Textual Retrieval (MoR) to retrieve these two types of knowledge via a Planning-Reasoning-Organizing framework.
arXiv Detail & Related papers (2025-02-27T17:42:52Z) - Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering [12.60063463163226]
IIER captures the internal connections between document chunks by considering three types of interactions: structural, keyword, and semantic.
It identifies multiple seed nodes based on the target question and iteratively searches for relevant chunks to gather supporting evidence.
It refines the context and reasoning chain, aiding the large language model in reasoning and answer generation.
arXiv Detail & Related papers (2024-08-06T02:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.