Fugu-MT 論文翻訳(概要): DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

論文の概要: DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

arxiv url: http://arxiv.org/abs/2602.07035v1
Date: Tue, 03 Feb 2026 09:12:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-10 20:26:24.385726
Title: DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents
Title（参考訳）: DLLM-Searcher:検索エージェントに対する拡散大言語モデルの適用
Authors: Jiahao Zhao, Shaoxuan Xu, Zhongxiang Sun, Fengqi Zhu, Jingyang Ou, Yuling Shi, Chongxuan Li, Xiao Zhang, Jun Xu,
Abstract要約: 拡散大言語モデル(dLLLLM)は、本質的に並列な復号化機構とフレキシブルな生成パラダイムによって実現され、独特な効率性を示す。サーチエージェントの急速な進歩にもかかわらず、その実践的展開は、1)チャレンジと呼ばれる基本的な制限によって制限される: マルチラウンド推論、ツール呼び出し、ツール応答の連続実行。本稿では,dLLMに基づく検索エージェントの最適化フレームワークを提案する。
参考スコア（独自算出の注目度）: 31.08047797205678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Meanwhile, despite the rapid advancement of Search Agents, their practical deployment is constrained by a fundamental limitation, termed as 1) Latency Challenge: the serial execution of multi-round reasoning, tool calling, and tool response waiting under the ReAct agent paradigm induces severe end-to-end latency. Intuitively, dLLMs can leverage their distinctive strengths to optimize the operational efficiency of agents under the ReAct agent paradigm. Practically, existing dLLM backbones face the 2) Agent Ability Challenge. That is, existing dLLMs exhibit remarkably weak reasoning and tool-calling capabilities, preventing these advantages from being effectively realized in practice. In this paper, we propose DLLM-Searcher, an optimization framework for dLLM-based Search Agents. To solve the Agent Ability Challenge, we design a two-stage post-training pipeline encompassing Agentic Supervised Fine-Tuning (Agentic SFT) and Agentic Variance-Reduced Preference Optimization Agentic VRPO, which enhances the backbone dLLM's information seeking and reasoning capabilities. To mitigate the Latency Challenge, we leverage the flexible generation mechanism of dLLMs and propose a novel agent paradigm termed Parallel-Reasoning and Acting P-ReAct. P-ReAct guides the model to prioritize decoding tool_call instructions, thereby allowing the model to keep thinking while waiting for the tool's return. Experimental results demonstrate that DLLM-Searcher achieves performance comparable to mainstream LLM-based search agents and P-ReAct delivers approximately 15% inference acceleration. Our code is available at https://anonymous.4open.science/r/DLLM-Searcher-553C
Abstract（参考訳）: 近年,拡散大言語モデル (dLLMs) は並列デコード機構とフレキシブルな生成パラダイムにより,一意の効率性を証明している。一方、検索エージェントの急速な進歩にもかかわらず、その実践的展開は基本的な制限によって制限され、いわゆる「検索エージェント」と呼ばれる。 1)レイテンシの課題: ReActエージェントのパラダイムの下でのマルチラウンド推論、ツール呼び出し、ツール応答のシリアル実行は、厳しいエンドツーエンドのレイテンシを引き起こす。直感的には、dLLMは、ReActエージェントパラダイムの下でエージェントの操作効率を最適化するために、その特有の強みを利用することができる。実際、既存のdLLMバックボーンが対面している。 2)エージェント能力の挑戦。つまり、既存のdLLMは驚くほど弱い推論能力とツールコール能力を示し、これらの利点が実際に効果的に実現されるのを防ぐ。本稿では,dLLMに基づく検索エージェントの最適化フレームワークであるDLLM-Searcherを提案する。エージェント能力の課題を解決するために,エージェント監視ファインチューニング (Agentic SFT) とエージェント可変再生参照最適化 (Agentic Variance-Reduced Preference Optimization Agentic VRPO) を含む2段階のポストトレーニングパイプラインを設計した。遅延問題を軽減するために,dLLMのフレキシブルな生成機構を活用し,P-Reasoning and Acting P-ReActと呼ばれる新しいエージェントパラダイムを提案する。 P-ReActは、デコードツール_call命令を優先するようにモデルをガイドするので、ツールの戻りを待つ間、モデルを思考し続けることができる。実験の結果,DLLM-Searcher は LLM ベースの検索エージェントに匹敵する性能を示し,P-ReAct は約15%の推論高速化を実現している。私たちのコードはhttps://anonymous.4open.science/r/DLLM-Searcher-553Cで利用可能です。

論文の概要: DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

関連論文リスト