Fugu-MT 論文翻訳(概要): SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

論文の概要: SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

arxiv url: http://arxiv.org/abs/2605.29796v1
Date: Thu, 28 May 2026 11:45:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.209342
Title: SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
Title（参考訳）: SAAS:エージェント検索における過剰探索緩和のための自己認識強化学習
Authors: Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, Qinggang Zhang, Jinsong Su,
Abstract要約: 自己認識の欠如は、厳密なtextbfover-search を引き起こし、かなりの推論遅延と禁忌な計算コストを引き起こす。本稿では,探索行動を正確に制御し,精度を損なうことなく,動的自己認識を実現するための新しいRLフレームワークであるSAASを提案する。
参考スコア（独自算出の注目度）: 38.532946868233736
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe \textbf{over-search}, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.
Abstract（参考訳）: エージェントサーチにより、LLMは反復推論と外部探索によって複雑なマルチホップ問題を解くことができる。エージェントは自身の知識の境界を認識しず、内部の知識が十分であるときに検索を盲目的にトリガーし、適切な証拠が収集された場合でも検索を終了させません。自己認識の欠如は、深刻な \textbf{over-search} を引き起こし、かなりの推論遅延と禁忌な計算コストを引き起こす。そこで本稿では,探索動作を正確に制御し,精度を損なうことなく,動的自己認識を実現するための新しいRLフレームワークであるSAASを提案する。 SAASは3つの重要なコンポーネントを導入している。一検索不能及び検索可能なロールアウトを対比することにより、進化政策の下で検索境界を識別する検索境界モデリング機構二この境界認識を軌跡レベルの罰則に変換し、不要で冗長な探索を抑える境界認識報酬モジュール三逐次カリキュラムを活用して、検索正規化よりも推論を優先し、報酬のハッキングを避ける段階最適化戦略。大規模な実験により、SAASは精度を維持しながら、過剰探索を大幅に削減することが示された。私たちのコードはhttps://github.com/XMUDeepLIT/SAASで匿名でリリースされています。

論文の概要: SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

関連論文リスト