Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models
- URL: http://arxiv.org/abs/2511.07581v1
- Date: Wed, 12 Nov 2025 01:05:29 GMT
- Title: Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models
- Authors: Supriti Vijay, Aman Priyanshu, Anu Vellore, Baturay Saglam, Amin Karbasi,
- Abstract summary: We introduce Orion, a training framework that enables compact models to perform iterative retrieval through learned search strategies.<n>Orion combines synthetic trajectory generation and supervised fine-tuning to encourage diverse exploration patterns in models.<n>Despite using only 3% of the training data available, our 1.2B model achieves 77.6% success on SciFact.
- Score: 28.80331720382804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective information retrieval requires reasoning over partial evidence and refining strategies as information emerges. Yet current approaches fall short: neural retrievers lack reasoning capabilities, large language models (LLMs) provide semantic depth but at prohibitive cost, and query rewriting or decomposition limits improvement to static transformations. As a result, existing methods fail to capture the iterative dynamics of exploration, feedback, and revision that complex user queries demand. We introduce Orion, a training framework that enables compact models (350M-1.2B parameters) to perform iterative retrieval through learned search strategies. Orion combines: (1) synthetic trajectory generation and supervised fine-tuning to encourage diverse exploration patterns in models, (2) reinforcement learning (RL) that rewards effective query refinement and backtracking behaviors, and (3) inference-time beam search algorithms that exploit the self-reflection capabilities learned during RL. Despite using only 3% of the training data available, our 1.2B model achieves 77.6% success on SciFact (vs. 72.6% for prior retrievers), 25.2% on BRIGHT (vs. 22.1%), 63.2% on NFCorpus (vs. 57.8%), and remains competitive on FEVER, HotpotQA, and MSMarco. It outperforms retrievers up to 200-400x larger on five of six benchmarks. These findings suggest that retrieval performance can emerge from learned strategies, not just model scale, when models are trained to search, reflect, and revise.
Related papers
- LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum [73.82125917416067]
LACONIC is a family of learned sparse retrievers based on the Llama-3 architecture.<n>The 8B variant achieves a state-of-the-art 60.2 nDCG on the MTEB Retrieval benchmark, ranking 15th on the leaderboard.
arXiv Detail & Related papers (2026-01-04T22:42:20Z) - Representation-Based Exploration for Language Models: From Test-Time to Post-Training [50.144031964319424]
Reinforcement learning (RL) promises to expand the capabilities of language models.<n>It is unclear if current RL techniques promote the discovery of novel behaviors, or simply sharpen those already present in the base model.<n>We investigate the value of deliberate exploration -- explicitly incentivizing the model to discover novel and diverse behaviors.
arXiv Detail & Related papers (2025-10-13T17:49:05Z) - ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context [66.15505423059234]
We introduce ASTRO, a framework for training language models to reason like search algorithms.<n>We apply ASTRO to the Llama 3 family of models and achieve absolute performance gains of 16.4% on MATH-500, 26.9% on AMC 2023, and 20.0% on AIME 2024.
arXiv Detail & Related papers (2025-07-01T04:10:15Z) - s3: You Don't Need That Much Data to Train a Search Agent via RL [34.862294169425724]
Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference.<n>We propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward.
arXiv Detail & Related papers (2025-05-20T09:53:56Z) - SEM: Reinforcement Learning for Search-Efficient Large Language Models [26.075903427834838]
Large Language Models (LLMs) have demonstrated their capabilities not only in reasoning but also in invoking external tools.<n>Existing reinforcement learning approaches often lead to redundant search behaviors, resulting in inefficiencies and over-cost.<n>We propose SEM, a novel post-training reinforcement learning framework that explicitly trains LLMs to optimize search usage.
arXiv Detail & Related papers (2025-05-12T09:45:40Z) - Reinforcement Learning for Reasoning in Large Language Models with One Training Example [117.86853102104256]
We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the math reasoning capabilities of large language models (LLMs)<n>We identify some interesting phenomena during 1-shot RLVR, including cross-category generalization, increased frequency of self-reflection, and sustained test performance improvement.
arXiv Detail & Related papers (2025-04-29T09:24:30Z) - CSPLADE: Learned Sparse Retrieval with Causal Language Models [13.999080540889494]
We identify two challenges in training large language models (LLM) for Learned sparse retrieval (LSR)<n>We propose two corresponding techniques: (1) a lightweight adaptation training phase to eliminate training instability; (2) two model variants to enable bidirectional information.<n>With these techniques, we are able to train LSR models with 8B scale LLM, and achieve competitive retrieval performance with reduced index size.
arXiv Detail & Related papers (2025-04-15T02:31:34Z) - An Empirical Study on Eliciting and Improving R1-like Reasoning Models [90.52239241349504]
scaling RL training has become a central technique for implementing such reasoning models.<n>We demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models.<n>We also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models.
arXiv Detail & Related papers (2025-03-06T15:34:27Z) - DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning [44.806321084404324]
DeepRetrieval is a reinforcement learning (RL) approach that trains LLMs for query generation through trial and error without supervised data.<n>Using retrieval metrics as rewards, our system generates queries that maximize retrieval performance.
arXiv Detail & Related papers (2025-02-28T22:16:42Z) - The Surprising Effectiveness of Test-Time Training for Few-Shot Learning [59.309477460893916]
Language models (LMs) have shown impressive performance on tasks within their training distribution, but often struggle with structurally novel tasks.<n>We investigate the effectiveness of test-time training (TTT) as a mechanism for improving LMs' reasoning and few-shot learning capabilities.<n>Our findings highlight the limitations of in-context learning for novel tasks and demonstrate the potential of test-time training to enhance language model adaptability.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.