Related papers: DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing

DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing

URL: http://arxiv.org/abs/2601.03261v1
Date: Tue, 16 Dec 2025 07:07:28 GMT
Title: DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing
Authors: Shuo Lu, Yinuo Xu, Jianjie Cheng, Lingxiao He, Meng Wang, Jian Liang,
Abstract summary: We propose DeepResearch-Slice to bridge the retrieval-utilization gap.<n>Unlike implicit attention, our approach predicts precise span indices to perform a deterministic hard filter before reasoning.
Score: 20.480828184335856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Research agents predominantly optimize search policies to maximize retrieval probability. However, we identify a critical bottleneck: the retrieval-utilization gap, where models fail to use gold evidence even after it is retrieved, due to context blindness in noisy environments. To bridge this gap, we propose DeepResearch-Slice, a simple yet effective neuro-symbolic framework. Unlike implicit attention, our approach predicts precise span indices to perform a deterministic hard filter before reasoning. Extensive evaluations across six benchmarks show substantial robustness gains. Applying our method to frozen backbones yields a 73 percent relative improvement, from 19.1 percent to 33.0 percent, effectively mitigating noise without requiring parameter updates to the reasoning model. These results highlight the need for explicit grounding mechanisms in open-ended research.

Related papers

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization [64.61432234404276]
emphSearch More, Think Less (SMTL) is a framework for long-horizon agentic search that targets both efficiency and generalization.<n>We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks.
arXiv Detail & Related papers (2026-02-26T06:46:41Z)
Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models [2.5170433424424874]
Reinforcement Learning with Verifiable Rewards has established itself as the dominant paradigm for instilling rigorous reasoning capabilities in Large Language Models.<n>We identify a critical pathology in this alignment process: the systematic suppression of valid but rare (low-likelihood under the base model distribution) reasoning paths.<n>We propose Amortized Reasoning Tree Search (ARTS) to counteract this collapse without discarding the base model's latent diversity.
arXiv Detail & Related papers (2026-02-13T11:52:50Z)
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning [53.58654277639939]
In-context exploration is the intrinsic ability to generate, verify, and refine hypotheses within a single continuous context.<n>We propose Length-Incentivized Exploration, which explicitly encourages models to explore more.<n>Our method achieves an average improvement of 4.4% on in-domain tasks and a 2.7% gain on out-of-domain benchmarks.
arXiv Detail & Related papers (2026-02-12T09:24:32Z)
LightSearcher: Efficient DeepSearch via Experiential Memory [23.338677838845]
We propose an efficient reinforcement learning framework that balances accuracy and efficiency in DeepSearch paradigms.<n>Experiments on four multi-hop QA benchmarks show that LightSearcher maintains accuracy comparable to SOTA baseline ReSearch.
arXiv Detail & Related papers (2025-12-07T04:29:52Z)
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent [74.10138164281618]
BrowseComp-Plus is a benchmark derived from BrowseComp, employing a fixed, carefully curated corpus.<n>This benchmark allows comprehensive evaluation and disentangled analysis of deep research agents and retrieval methods.
arXiv Detail & Related papers (2025-08-08T17:55:11Z)
Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation [37.47571308389908]
Retrieval-Augmented Generation (RAG) effectively improves the accuracy of Large Language Models (LLMs)<n>Previous methods extract evidence straightforwardly without explicit thinking, which risks filtering out key clues and struggles with generalization.<n>We propose Evi Omni, which learns to extract rational evidence by (1) explicitly reasoning to identify potential cues within retrieval contents first, and then (2) consciously extracting to avoid omitting any key cues helpful for answering questions.
arXiv Detail & Related papers (2025-07-21T13:03:55Z)
Benchmarking Reasoning Robustness in Large Language Models [76.79744000300363]
We find significant performance degradation on novel or incomplete data.<n>These findings highlight the reliance on recall over rigorous logical inference.<n>This paper introduces a novel benchmark, termed as Math-RoB, that exploits hallucinations triggered by missing information to expose reasoning gaps.
arXiv Detail & Related papers (2025-03-06T15:36:06Z)
DeepRAG: Thinking to Retrieve Step by Step for Large Language Models [92.87532210660456]
We propose DeepRAG, a framework that models retrieval-augmented reasoning as a Markov Decision Process (MDP)<n>By iteratively decomposing queries, DeepRAG dynamically determines whether to retrieve external knowledge or rely on parametric reasoning at each step.<n> Experiments show that DeepRAG improves retrieval efficiency and boosts answer accuracy by 26.4%, demonstrating its effectiveness in enhancing retrieval-augmented reasoning.
arXiv Detail & Related papers (2025-02-03T08:22:45Z)
Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search [44.774302677330105]
Estimating rigid transformation with 6 degrees of freedom based on a putative 3D correspondence set is a crucial procedure in point cloud registration. This paper proposes a parameter search strategy to accelerate the search while maintaining high robustness. Our strategy largely reduces the search space and can guarantee accuracy with only a few inlier samples.
arXiv Detail & Related papers (2024-04-09T09:28:05Z)
Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM) We propose a decoding algorithm integrating the self-evaluation guidance via beam search. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z)
Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions [17.097192646470372]
We introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. We develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the super-structure.
arXiv Detail & Related papers (2022-01-14T20:52:30Z)
Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise. First-order guarantees are relatively well understood in statistical and online learning. We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.