BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2508.05100v1
- Date: Thu, 07 Aug 2025 07:37:25 GMT
- Title: BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation
- Authors: Yuhao Wang, Ruiyang Ren, Yucheng Wang, Jing Liu, Wayne Xin Zhao, Hua Wu, Haifeng Wang,
- Abstract summary: We propose the balanced entropy-engineered RAG (BEE-RAG) framework to improve adaptability of RAG systems to varying context lengths.<n>BEE-RAG separates attention sensitivity from context length, ensuring a stable entropy level.<n>Building upon this, we introduce a zero-shot inference strategy for multi-importance estimation and a parameter-efficient adaptive fine-tuning mechanism.
- Score: 77.10390725623125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid advancement of large language models (LLMs), retrieval-augmented generation (RAG) has emerged as a critical approach to supplement the inherent knowledge limitations of LLMs. However, due to the typically large volume of retrieved information, RAG tends to operate with long context lengths. From the perspective of entropy engineering, we identify unconstrained entropy growth and attention dilution due to long retrieval context as significant factors affecting RAG performance. In this paper, we propose the balanced entropy-engineered RAG (BEE-RAG) framework, which improves the adaptability of RAG systems to varying context lengths through the principle of entropy invariance. By leveraging balanced context entropy to reformulate attention dynamics, BEE-RAG separates attention sensitivity from context length, ensuring a stable entropy level. Building upon this, we introduce a zero-shot inference strategy for multi-importance estimation and a parameter-efficient adaptive fine-tuning mechanism to obtain the optimal balancing factor for different settings. Extensive experiments across multiple RAG tasks demonstrate the effectiveness of BEE-RAG.
Related papers
- The Role of Entropy in Visual Grounding: Analysis and Optimization [69.51909526456606]
We introduce ECVGPO (Entropy Control Visual Grounding Policy Optimization), an interpretable algorithm designed for effective entropy regulation.<n> Experiments show that ECVGPO achieves broad improvements across various benchmarks and models.
arXiv Detail & Related papers (2025-12-07T08:33:55Z) - Revisiting Entropy in Reinforcement Learning for Large Reasoning Models [54.96908589622163]
We investigate the entropy dynamics of large language models trained withReinforcement learning with verifiable rewards (RLVR)<n>Our findings reveal that the number of off-policy updates, the diversity of training data, and the clipping thresholds in the optimization objective are critical factors influencing the entropy of LLMs trained with RLVR.
arXiv Detail & Related papers (2025-11-08T12:50:41Z) - Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning [55.59724323303857]
We propose a framework that balances exploration and exploitation via three components: difficulty-aware coefficient allocation, initial-anchored target entropy, and dynamic global coefficient adjustment.<n>Experiments on multiple mathematical reasoning benchmarks show that AER consistently outperforms baselines, improving both reasoning accuracy and exploration capability.
arXiv Detail & Related papers (2025-10-13T03:10:26Z) - Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation [9.493788719707835]
Retrieval-augmented generation (RAG) has become a widely recognized paradigm to combine parametric memory with non-parametric memories.<n>A major challenge in end-to-end optimization of the RAG model is that marginalization over relevant passages is required.<n>In this paper, we propose and develop joint approximation (JSA) based end-to-end training of RAG.<n>The JSA algorithm is an extension of the EM (expectation-maximization) algorithm and is particularly powerful in estimating latent variable models.
arXiv Detail & Related papers (2025-08-25T16:17:16Z) - Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z) - RADIANT: Retrieval AugmenteD entIty-context AligNmenT -- Introducing RAG-ability and Entity-Context Divergence [5.066415370344766]
Retrieval-Augmented Generation (RAG) is a technique to enhance factual accuracy by integrating external knowledge into the generation process.<n>This paper introduces Radiant, a framework that merges RAG with alignment designed to optimize the interplay between retrieved evidence and generated content.
arXiv Detail & Related papers (2025-06-28T21:40:35Z) - Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation [32.30660197797758]
We introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD)<n>We demonstrate superior accuracy and significant computational efficiency improvements compared to the previous surrogate-based method.<n>Our mechanistic analysis reveals specific attention heads and multilayer perceptron (MLP) layers responsible for context attribution.
arXiv Detail & Related papers (2025-05-22T09:04:03Z) - RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving [9.962031642362813]
Retrieval-augmented generation (RAG) is emerging as a popular approach for reliable LLM serving.<n>RAG is a structured abstraction that captures the wide range of RAG algorithms.<n> RAGO is a system optimization framework for efficient RAG serving.
arXiv Detail & Related papers (2025-03-18T18:58:13Z) - Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG)
We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks.
UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z) - Introducing a new hyper-parameter for RAG: Context Window Utilization [0.0]
RAG systems enhance generative models by incorporating relevant information retrieved from external knowledge bases.
The size of the text chunks retrieved and processed is a critical factor influencing RAG performance.
This study aims to identify the optimal chunk size that maximizes answer generation quality.
arXiv Detail & Related papers (2024-07-29T08:38:14Z) - RAGGED: Towards Informed Design of Scalable and Stable RAG Systems [51.171355532527365]
Retrieval-augmented generation (RAG) enhances language models by integrating external knowledge.<n>RAGGED is a framework for systematically evaluating RAG systems.
arXiv Detail & Related papers (2024-03-14T02:26:31Z) - Gated Recurrent Neural Networks with Weighted Time-Delay Feedback [55.596897987498174]
We present a novel approach to modeling long-term dependencies in sequential data by introducing a gated recurrent unit (GRU) with a weighted time-delay feedback mechanism.<n>Our proposed model, named $tau$-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs)
arXiv Detail & Related papers (2022-12-01T02:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.