SAGE: Strategy-Adaptive Generation Engine for Query Rewriting
- URL: http://arxiv.org/abs/2506.19783v2
- Date: Sat, 26 Jul 2025 07:12:26 GMT
- Title: SAGE: Strategy-Adaptive Generation Engine for Query Rewriting
- Authors: Teng Wang, Hailei Gong, Changwang Zhang, Jun Wang,
- Abstract summary: We introduce the Strategy-Adaptive Generation Engine (SAGE), which operationalizes expert-crafted strategies in an reinforcement learning framework.<n>SAGE achieves new state-of-the-art NDCG@10 results, but also uncovers a compelling emergent behavior.<n>Our findings demonstrate that strategy-guided RL, enhanced with nuanced reward shaping, offers a scalable, efficient, and more interpretable paradigm for developing the next generation of robust information retrieval systems.
- Score: 8.941793732446856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query rewriting is pivotal for enhancing dense retrieval, yet current methods demand large-scale supervised data or suffer from inefficient reinforcement learning (RL) exploration. In this work, we first establish that guiding Large Language Models (LLMs) with a concise set of expert-crafted strategies, such as semantic expansion and entity disambiguation, substantially improves retrieval effectiveness on challenging benchmarks, including HotpotQA, FEVER, NFCorpus, and SciFact. Building on this insight, we introduce the Strategy-Adaptive Generation Engine (SAGE), which operationalizes these strategies in an RL framework. SAGE introduces two novel reward shaping mechanisms-Strategic Credit Shaping (SCS) and Contrastive Reward Shaping (CRS)-to deliver more informative learning signals. This strategy-guided approach not only achieves new state-of-the-art NDCG@10 results, but also uncovers a compelling emergent behavior: the agent learns to select optimal strategies, reduces unnecessary exploration, and generates concise rewrites, lowering inference cost without sacrificing performance. Our findings demonstrate that strategy-guided RL, enhanced with nuanced reward shaping, offers a scalable, efficient, and more interpretable paradigm for developing the next generation of robust information retrieval systems.
Related papers
- Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling [39.98859285173431]
The flexible job-shop scheduling problem (FJSP) has attracted significant attention due to its complex and strong alignment with real-world production scenarios.<n>Current deep reinforcement learning (DRL)-based approaches to FJSP predominantly employ constructive methods.<n>This paper proposes a Memory-enhanced Improvement Search framework with heterogeneous graph representation--MIStar.
arXiv Detail & Related papers (2026-03-03T10:43:01Z) - Expanding LLM Agent Boundaries with Strategy-Guided Exploration [51.98616048282804]
Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding.<n>We propose Strategy-Guided Exploration (SGE) to shift exploration from low-level actions to higher-level language strategies.
arXiv Detail & Related papers (2026-03-02T16:28:39Z) - Generative Actor Critic [74.04971271003869]
Generative Actor Critic (GAC) is a novel framework that decouples sequential decision-making by reframing textitpolicy evaluation as learning a generative model of the joint distribution over trajectories and returns.<n>Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-12-25T06:31:11Z) - Multi-hop Reasoning via Early Knowledge Alignment [68.28168992785896]
Early Knowledge Alignment (EKA) aims to align Large Language Models with contextually relevant retrieved knowledge.<n>EKA significantly improves retrieval precision, reduces cascading errors, and enhances both performance and efficiency.<n>EKA proves effective as a versatile, training-free inference strategy that scales seamlessly to large models.
arXiv Detail & Related papers (2025-12-23T08:14:44Z) - Chained Prompting for Better Systematic Review Search Strategies [0.6633201258809686]
We introduce a Large Language Model-based chained prompt engineering framework for the automated development of search strategies in systematic reviews.<n>The framework replicates the procedural structure of manual search design while leveraging LLMs to decompose review objectives, extract and PICO elements, generate conceptual representations, expand terminologies, and synthesize queries.
arXiv Detail & Related papers (2025-11-28T12:12:38Z) - Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts [63.412646471177645]
We propose a novel Reinforced Strategy Optimization (RSO) method for Conversational Recommender Systems (CRSs)<n>RSO decomposes the process of generating strategy-driven response decisions into the macro-level strategy planning and micro-level strategy adaptation.<n>Experiments show that RSO significantly improves interaction performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-30T11:12:01Z) - Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning [56.496001894673235]
Reinforcement Learning (RL) has proven highly effective at enhancing the complex reasoning abilities of Large Language Models (LLMs)<n>Our analysis reveals that puzzling phenomena like aha moments", length-scaling'' and entropy dynamics are not disparate occurrences but hallmarks of an emergent reasoning hierarchy.
arXiv Detail & Related papers (2025-09-03T18:52:49Z) - KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG [63.82127103851471]
Retrieval-Augmented Generation (RAG) enables large language models to access broader knowledge sources.<n>We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance.<n>We present KARE-RAG, which improves knowledge utilization through three key innovations.
arXiv Detail & Related papers (2025-06-03T06:31:17Z) - Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning [45.10424242207931]
Retrieval-augmented generation (RAG) enhances the text generation capabilities of large language models (LLMs)<n>We introduce a novel method ReasonRAG that automatically constructs RAG-ProGuide, a high-quality dataset providing process-level rewards for query generation, evidence extraction, and answer generation.<n>With the process-level policy optimization, the proposed framework empowers LLMs to autonomously invoke search, generate queries, extract relevant evidence, and produce final answers.
arXiv Detail & Related papers (2025-05-20T08:21:00Z) - DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning [27.336254612018404]
Reinforcement learning from expert demonstrations has long remained a challenging research problem.<n>Existing state-of-the-art methods using behavioral cloning plus further RL training often suffer from poor generalization, low sample efficiency, and poor model interpretability.<n>We propose a novel strategy-based reinforcement learning framework integrated with large language models (LLMs) to overcome these limitations.
arXiv Detail & Related papers (2025-05-06T05:53:09Z) - Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval [49.669503570350166]
Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task.<n>Existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively.<n>We propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking.
arXiv Detail & Related papers (2025-04-07T15:27:37Z) - Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding [2.368662284133926]
We present a framework for enhancing Retrieval-Augmented Generation (RAG) systems through dynamic retrieval strategies and reinforcement fine-tuning.<n>Our framework integrates two complementary techniques: Policy-d RetrievalAugmented Generation (PORAG) and Adaptive Token-Layer Attention Scoring (ATLAS)<n>Our framework reduces hallucinations, strengthens domain-specific reasoning, and achieves significant efficiency and scalability gains over traditional RAG systems.
arXiv Detail & Related papers (2025-04-02T01:16:10Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning [24.648819770922515]
We introduce RAG-RL, an answer generation model trained not only to produce answers but also to identify and cite relevant information from larger sets of retrieved contexts.<n>Our approach uses curriculum learning, where the model is first trained on easier examples that include only relevant contexts.<n>Our experiments show that these training samples enable models to acquire citation and reasoning skills with greater sample efficiency and generalizability.
arXiv Detail & Related papers (2025-03-17T02:53:42Z) - Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs? [69.38149239733994]
We investigate whether complex robust training strategies remain necessary as model capacity grows.<n>We find that as models become more powerful, the performance gains brought by complex robust training methods drop off dramatically.<n>Our findings suggest that RAG systems can benefit from simpler architectures and training strategies as models become more powerful.
arXiv Detail & Related papers (2025-02-17T03:34:31Z) - REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z) - Learning to Rank in Generative Retrieval [62.91492903161522]
Generative retrieval aims to generate identifier strings of relevant passages as the retrieval target.
We propose a learning-to-rank framework for generative retrieval, dubbed LTRGR.
This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems.
arXiv Detail & Related papers (2023-06-27T05:48:14Z) - Reinforcement Learning in Credit Scoring and Underwriting [7.356954349107956]
We adapt reinforcement learning principles for credit scoring, incorporating action space renewal and multi-choice actions.
We introduce two new RL-based credit underwriting algorithms to enable more informed decision-making.
arXiv Detail & Related papers (2022-12-15T06:36:14Z) - Revisiting GANs by Best-Response Constraint: Perspective, Methodology,
and Application [49.66088514485446]
Best-Response Constraint (BRC) is a general learning framework to explicitly formulate the potential dependency of the generator on the discriminator.
We show that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.
arXiv Detail & Related papers (2022-05-20T12:42:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.