Fugu-MT 論文翻訳(概要): MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

論文の概要: MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

arxiv url: http://arxiv.org/abs/2510.04935v1
Date: Mon, 06 Oct 2025 15:42:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.951349
Title: MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning
Title（参考訳）: MARS:マルチエージェント強化学習によるデュアルシステムディープリサーチの最適化
Authors: Guoxin Chen, Zile Qiao, Wenqing Wang, Donglei Yu, Xuanzhong Chen, Hao Sun, Minpeng Liao, Kai Fan, Yong Jiang, Penguin Xie, Wayne Xin Zhao, Ruihua Song, Fei Huang,
Abstract要約: 複雑な推論タスクのための大規模言語モデル(LLM)は、直感的で意図的な認知プロセスを橋渡しする革新的なアプローチを必要とする。本稿では,Multi-Agent System for Deep ReSearch (MARS)を提案する。
参考スコア（独自算出の注目度）: 82.14973479594367
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Reasoning Models (LRMs) often exhibit a tendency for overanalysis in simple tasks, where the models excessively utilize System 2-type, deliberate reasoning, leading to inefficient token generation. Furthermore, these models face challenges in adapting their reasoning capabilities to rapidly changing environments due to the static nature of their pretraining data. To address these issues, advancing Large Language Models (LLMs) for complex reasoning tasks requires innovative approaches that bridge intuitive and deliberate cognitive processes, akin to human cognition's dual-system dynamic. This paper introduces a Multi-Agent System for Deep ReSearch (MARS) enabling seamless integration of System 1's fast, intuitive thinking with System 2's deliberate reasoning within LLMs. MARS strategically integrates multiple external tools, such as Google Search, Google Scholar, and Python Interpreter, to access up-to-date information and execute complex computations, while creating a specialized division of labor where System 1 efficiently processes and summarizes high-volume external information, providing distilled insights that expand System 2's reasoning context without overwhelming its capacity. Furthermore, we propose a multi-agent reinforcement learning framework extending Group Relative Policy Optimization to simultaneously optimize both systems with multi-turn tool interactions, bin-packing optimization, and sample balancing strategies that enhance collaborative efficiency. Extensive experiments demonstrate MARS achieves substantial improvements of 3.86% on the challenging Humanity's Last Exam (HLE) benchmark and an average gain of 8.9% across 7 knowledge-intensive tasks, validating the effectiveness of our dual-system paradigm for complex reasoning in dynamic information environments.
Abstract（参考訳）: 大規模推論モデル(LRM)は、単純なタスクにおいて過度に分析される傾向があり、そこではモデルが過度にシステム2型、故意の推論を利用し、非効率なトークン生成をもたらす。さらに、これらのモデルは、事前学習データの静的な性質のため、環境の急速な変化に推論能力を適用する上で、課題に直面している。これらの問題に対処するために、複雑な推論タスクのために大規模言語モデル(LLM)を前進させるには、人間の認知の二重系力学に似た、直感的で意図的な認知プロセスを橋渡しする革新的なアプローチが必要である。本稿では,Multi-Agent System for Deep ReSearch (MARS)を提案する。 MARSは、Google Search、Google Scholar、Python Interpreterなどの複数の外部ツールを戦略的に統合し、最新の情報にアクセスし、複雑な計算を実行すると同時に、System 1が効率よく処理し、高ボリュームの外部情報を要約する特別な分業を作成する。さらに,グループ相対政策最適化を拡張したマルチエージェント強化学習フレームワークを提案する。大規模な実験により、MARSは挑戦的なHumanityのLast Exam(HLE)ベンチマークで3.86%の大幅な改善を達成し、7つの知識集約タスクで平均8.9%向上し、動的情報環境における複雑な推論のためのデュアルシステムパラダイムの有効性を検証した。

論文の概要: MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

関連論文リスト