To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention
- URL: http://arxiv.org/abs/2602.03304v1
- Date: Tue, 03 Feb 2026 09:29:06 GMT
- Title: To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention
- Authors: Wenlin Zhang, Kuicai Dong, Junyi Li, Yingyi Zhang, Xiaopeng Li, Pengyue Jia, Yi Wen, Derong Xu, Maolin Wang, Yichao Wang, Yong Liu, Xiangyu Zhao,
- Abstract summary: We identify the root cause of misaligned decision boundaries, the threshold determining when accumulated information suffices to answer.<n>This causes over-search (redundant searching despite sufficient knowledge) and under-search (premature termination yielding incorrect answers.<n>We propose a comprehensive framework comprising two key components. First, we introduce causal intervention-based diagnosis that identifies boundary errors.<n>Second, we develop Decision Boundary Alignment for Deep Search agents (DAS)<n>Our DAS method effectively calibrates these boundaries, mitigating both over-search and under-search to achieve substantial gains in accuracy and efficiency.
- Score: 61.82680155643223
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep search agents, which autonomously iterate through multi-turn web-based reasoning, represent a promising paradigm for complex information-seeking tasks. However, current agents suffer from critical inefficiency: they conduct excessive searches as they cannot accurately judge when to stop searching and start answering. This stems from outcome-centric training that prioritize final results over the search process itself. We identify the root cause as misaligned decision boundaries, the threshold determining when accumulated information suffices to answer. This causes over-search (redundant searching despite sufficient knowledge) and under-search (premature termination yielding incorrect answers). To address these errors, we propose a comprehensive framework comprising two key components. First, we introduce causal intervention-based diagnosis that identifies boundary errors by comparing factual and counterfactual trajectories at each decision point. Second, we develop Decision Boundary Alignment for Deep Search agents (DAS), which constructs preference datasets from causal feedback and aligns policies via preference optimization. Experiments on public datasets demonstrate that decision boundary errors are pervasive across state-of-the-art agents. Our DAS method effectively calibrates these boundaries, mitigating both over-search and under-search to achieve substantial gains in accuracy and efficiency. Our code and data are publicly available at: https://github.com/Applied-Machine-Learning-Lab/WWW2026_DAS.
Related papers
- SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback [68.60326181052658]
We propose an agentic pipeline that automatically generates high quality, difficulty-controlled deep search question-answer pairs.<n>Our pipeline, SAGE, consists of a data generator which proposes QA pairs and a search agent which attempts to solve the generated question.<n>Our intrinsic evaluation shows SAGE generates questions that require diverse reasoning strategies, while significantly increases the correctness and difficulty of the generated data.
arXiv Detail & Related papers (2026-01-26T06:37:56Z) - AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning [61.974530499621274]
Overreliance on search introduces unnecessary cost and risks exposure to noisy or malicious content.<n>We propose a two-stage, outcome-driven RL framework that disentangles problem solving from the decision of whether to invoke search.<n>AdaSearch substantially improves knowledge-boundary awareness, reduces unnecessary search calls, preserves strong task performance, and offers more transparent, interpretable decision behaviors.
arXiv Detail & Related papers (2025-12-18T18:50:01Z) - Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z) - RAVine: Reality-Aligned Evaluation for Agentic Search [7.4420114967110385]
RAVine is a Reality-Aligned eValuation framework for agentic LLMs with search.<n> RAVine targets multi-point queries and long-form answers that better reflect user intents.<n>We benchmark a series of models using RAVine and derive several insights.
arXiv Detail & Related papers (2025-07-22T16:08:12Z) - Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation [24.32551050538683]
Embodied AI has made significant progress acting in unexplored environments.<n>Current search methods largely focus on dated perception models, neglect temporal aggregation, and transfer from ground truth directly to noisy perception at test time.<n>We address the identified problems through calibrated perception probabilities and uncertainty across aggregation and found decisions.
arXiv Detail & Related papers (2024-08-05T08:14:28Z) - Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM)
We propose a decoding algorithm integrating the self-evaluation guidance via beam search.
Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z) - Multi-Agent Active Search using Detection and Location Uncertainty [6.587280549237275]
Active search algorithms must contend with two types of uncertainty: detection uncertainty and location uncertainty.
We first propose an inference method to jointly handle both target detection and location uncertainty.
We then build a decision making algorithm that uses Thompson sampling to enable decentralized multi-agent active search.
arXiv Detail & Related papers (2022-03-09T04:53:37Z) - An Automated Approach to Causal Inference in Discrete Settings [8.242194776558895]
We show an algorithm to automatically bound causal effects using efficient dual relaxation and spatial branch-and-bound techniques.
The algorithm searches over admissible data-generating processes and outputs the most precise possible range consistent with available information.
It offers an additional guarantee we refer to as $epsilon$-sharpness, characterizing the incomplete bounds.
arXiv Detail & Related papers (2021-09-28T03:55:32Z) - A2Log: Attentive Augmented Log Anomaly Detection [53.06341151551106]
Anomaly detection becomes increasingly important for the dependability and serviceability of IT services.
Existing unsupervised methods need anomaly examples to obtain a suitable decision boundary.
We develop A2Log, which is an unsupervised anomaly detection method consisting of two steps: Anomaly scoring and anomaly decision.
arXiv Detail & Related papers (2021-09-20T13:40:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.