Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth
- URL: http://arxiv.org/abs/2510.15719v1
- Date: Fri, 17 Oct 2025 15:04:03 GMT
- Title: Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth
- Authors: Helia Hashemi, Victor Rühle, Saravan Rajmohan,
- Abstract summary: We propose a retrieval-augmented reasoning model that dynamically adjusts the length of the retrieved document list.<n>We develop a cost-aware advantage function for training of efficient retrieval-augmented reasoning models through reinforcement learning.<n>We evaluate our approach on seven public question answering datasets and demonstrate significant efficiency gains, without compromising effectiveness.
- Score: 18.05278637533445
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reasoning models have gained significant attention due to their strong performance, particularly when enhanced with retrieval augmentation. However, these models often incur high computational costs, as both retrieval and reasoning tokens contribute substantially to the overall resource usage. In this work, we make the following contributions: (1) we propose a retrieval-augmented reasoning model that dynamically adjusts the length of the retrieved document list based on the query and retrieval results; (2) we develop a cost-aware advantage function for training of efficient retrieval-augmented reasoning models through reinforcement learning; and (3) we explore both memory- and latency-bound implementations of the proposed cost-aware framework for both proximal and group relative policy optimization algorithms. We evaluate our approach on seven public question answering datasets and demonstrate significant efficiency gains, without compromising effectiveness. In fact, we observed that the model latency decreases by ~16-20% across datasets, while its effectiveness increases by ~5% on average, in terms of exact match.
Related papers
- Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward [24.738836592075927]
We introduce a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward.<n>Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines.<n>Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval.
arXiv Detail & Related papers (2026-01-31T18:15:50Z) - Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data [57.996437077411315]
We study the reasoning behavior of large language models (LLMs) under limited computation budgets.<n>We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase.<n> Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models.
arXiv Detail & Related papers (2026-01-16T07:09:30Z) - Efficient Hyperparameter Search for Non-Stationary Model Training [11.55466013293614]
We introduce a two-stage paradigm to reduce the cost of model training for online learning systems.<n>Our core insight is that focusing on accurate identification in the first stage, rather than achieving peak performance, allows for aggressive cost-saving measures.<n>We validate our framework's effectiveness through a dual evaluation: first on the Criteo 1TB dataset, the largest suitable public benchmark, and second on an industrial advertising system operating at a scale two orders of magnitude larger.
arXiv Detail & Related papers (2025-12-01T04:06:24Z) - Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration [33.02780998281276]
Reinforcement learning with verifiable rewards (RLVR) has improved the reasoning ability of large language models.<n>This study investigates how simply leveraging intrinsic data properties, almost free benefit during training, can improve data efficiency for RLVR.
arXiv Detail & Related papers (2025-11-02T04:16:47Z) - Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z) - Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens [51.90059610606049]
This paper revisits the efficiency of such reasoning processes through an information-theoretic lens.<n>We propose two metrics, InfoBias and InfoGain, to quantify divergence from ideal reasoning paths and stepwise information contribution.<n>Motivated by these findings, we introduce an entropy-based Adaptive Think strategy that dynamically halts reasoning once confidence is sufficiently high.
arXiv Detail & Related papers (2025-05-23T13:38:56Z) - Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency [3.5634988336513587]
We investigate how varying levels of query-context overlap affect model performance during both training and inference.<n>Our experiments reveal that increased overlap initially has minimal effect, but substantially improves test-time perplexity and model accelerates learning above a critical threshold.
arXiv Detail & Related papers (2025-05-20T12:58:07Z) - Reasoning of Large Language Models over Knowledge Graphs with Super-Relations [53.14275361052276]
We propose the ReKnoS framework, which aims to Reason over Knowledge Graphs with Super-Relations.<n>Our framework's key advantages include the inclusion of multiple relation paths through super-relations.<n>The results demonstrate the superior performance of ReKnoS over existing state-of-the-art baselines, with an average accuracy gain of 2.92%.
arXiv Detail & Related papers (2025-03-28T06:11:04Z) - Reqo: A Robust and Explainable Query Optimization Cost Model [2.184775414778289]
We propose a tree model architecture based on Bidirectional Graph Neural Networks (Bi-GNN) aggregated by Gated Recurrent Units (GRUs)<n>We implement a novel learning-to-rank cost model that effectively quantifies the uncertainty in cost estimates using approximate probabilistic ML.<n>In addition, we propose the first explainability technique specifically designed for learning-based cost models.
arXiv Detail & Related papers (2025-01-29T04:48:51Z) - iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use [56.31110409360567]
Augmenting large language models with external tools is a promising approach to enhance their capabilities.<n>We show that training gains significantly decay as synthetic data increases.<n>We propose an iterative reinforced fine-tuning strategy designed to alleviate this limitation.
arXiv Detail & Related papers (2025-01-15T04:52:34Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback [2.07180164747172]
This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human Feedback (RLHF)<n>RLHF leverages datasets of human preferences over outputs of large language models (LLM)<n>We show that introducing an auction mechanism can play an essential role in enhancing the cost-efficiency of RLHF.
arXiv Detail & Related papers (2024-09-27T03:15:07Z) - Think-then-Act: A Dual-Angle Evaluated Retrieval-Augmented Generation [3.2134014920850364]
Large language models (LLMs) often face challenges such as temporal misalignment and generating hallucinatory content.
We propose a dual-angle evaluated retrieval-augmented generation framework textitThink-then-Act'
arXiv Detail & Related papers (2024-06-18T20:51:34Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.