Fugu-MT 論文翻訳(概要): AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning

論文の概要: AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning

arxiv url: http://arxiv.org/abs/2512.16883v1
Date: Thu, 18 Dec 2025 18:50:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-19 18:10:32.228647
Title: AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning
Title（参考訳）: AdaSearch: 強化学習による大規模言語モデルにおけるパラメトリック知識と探索のバランス
Authors: Tzu-Han Lin, Wei-Lin Chen, Chen-An Li, Hung-yi Lee, Yun-Nung Chen, Yu Meng,
Abstract要約: 検索への過度な依存は、ノイズや悪意のあるコンテンツに対する不必要なコストとリスクをもたらす。本稿では,探索を起動するか否かの判断から問題を解き放つ2段階の結果駆動型RLフレームワークを提案する。 AdaSearchは知識境界認識を大幅に改善し、不要な検索コールを削減し、タスクパフォーマンスを強く保ち、透明性と解釈可能な意思決定行動を提供する。
参考スコア（独自算出の注目度）: 61.974530499621274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Equipping large language models (LLMs) with search engines via reinforcement learning (RL) has emerged as an effective approach for building search agents. However, overreliance on search introduces unnecessary cost and risks exposure to noisy or malicious content, while relying solely on parametric knowledge risks hallucination. The central challenge is to develop agents that adaptively balance parametric knowledge with external search, invoking search only when necessary. Prior work mitigates search overuse by shaping rewards around the number of tool calls. However, these penalties require substantial reward engineering, provide ambiguous credit assignment, and can be exploited by agents that superficially reduce calls. Moreover, evaluating performance solely through call counts conflates necessary and unnecessary search, obscuring the measurement of true adaptive behavior. To address these limitations, we first quantify the self-knowledge awareness of existing search agents via an F1-based decision metric, revealing that methods such as Search-R1 often overlook readily available parametric knowledge. Motivated by these findings, we propose AdaSearch, a simple two-stage, outcome-driven RL framework that disentangles problem solving from the decision of whether to invoke search, and makes this decision process explicit and interpretable. This transparency is crucial for high-stakes domains such as finance and medical question answering, yet is largely neglected by prior approaches. Experiments across multiple model families and sizes demonstrate that AdaSearch substantially improves knowledge-boundary awareness, reduces unnecessary search calls, preserves strong task performance, and offers more transparent, interpretable decision behaviors.
Abstract（参考訳）: 大規模言語モデル (LLM) を強化学習 (RL) を介して検索エンジンと組み合わせることが, 検索エンジン構築の効果的なアプローチとして浮上している。しかし、検索に対する過度な依存は、ノイズや悪意のあるコンテンツに不必要なコストやリスクをさらけ出し、パラメトリックな知識リスクの幻覚にのみ依存する。中心的な課題は、パラメトリック知識を外部検索と適応的にバランスさせ、必要なときにのみ検索を呼び出すエージェントを開発することである。以前の作業では、ツールコールの数に関する報酬を形作ることで、検索の過剰使用を軽減している。しかし、これらの罰則は相当な報酬エンジニアリングを必要とし、曖昧な信用割り当てを提供し、電話を表面的に減らすエージェントによって悪用される。さらに、コールカウントのみによる性能評価は、真の適応行動の測定を無視し、必要で不要な探索を行う。これらの制約に対処するために、我々はまずF1に基づく意思決定指標を用いて既存の検索エージェントの自己認識を定量化し、検索-R1のような手法が容易に利用できるパラメトリック知識を見落としていることを明らかにする。これらの結果に触発されたAdaSearchは、単純な2段階の結果駆動型RLフレームワークであり、探索を起動するかどうかの判断から問題解決を混乱させ、この決定プロセスを明確かつ解釈可能にする。この透明性は、ファイナンスや医療質問応答といった高額な領域にとって極めて重要であるが、従来のアプローチでは無視されている。複数のモデルファミリとサイズにわたる実験により、AdaSearchは知識境界認識を大幅に改善し、不要な検索コールを減らし、タスクパフォーマンスを保ち、透明性と解釈可能な意思決定行動を提供することが示された。

論文の概要: AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning

関連論文リスト