Fugu-MT 論文翻訳(概要): SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration

論文の概要: SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration

arxiv url: http://arxiv.org/abs/2510.19767v1
Date: Wed, 22 Oct 2025 16:56:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:16.156048
Title: SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration
Title（参考訳）: SmartSwitch: より深い思考の探索を促進することによって、LLM推論を克服する
Authors: Xichen Zhang, Sitong Wu, Haoru Tan, Shaozuo Yu, Yinghao Zhu, Ziyi He, Jiaya Jia,
Abstract要約: ロングチェーン・オブ・シークレット(LongCoT)は、複雑な推論タスクにおいて、大規模言語モデルによって達成された最近のブレークスルーの中心である。本稿では,SmartSwitch推論フレームワークという,シンプルで効果的な推論手法を提案する。このフレームワークは、プラグイン・アンド・プレイソリューションとして、どんな大きな言語モデルにも簡単に統合できる。
参考スコア（独自算出の注目度）: 49.290631188365786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The long chain-of-thought (LongCoT) capability is central to the recent breakthroughs achieved by large language models in complex reasoning tasks. However, the accompanying issue of ''underthinking'', where models exhibit shallow reasoning by frequently switching thoughts without sufficient exploration, limits both performance and token efficiency. To address this problem, we propose a simple yet effective reasoning strategy: the SmartSwitch inference framework. This framework can be easily integrated into any large language model as a plug-and-play solution, continuously monitoring the model's reasoning process to detect underthinking and guide it toward deeper exploration of promising but overlooked thoughts. Specifically, the perception module identifies points where thoughts switch and evaluates the potential of the preceding thought using an off-the-shelf process reward model (PRM). If a high-potential thought is found to be prematurely abandoned, the intervention module interrupts the ongoing inference, backtracks to the point before the switch, and inserts a "deepening prompt" to encourage further exploration along that promising path. Extensive experiments on challenging mathematical reasoning benchmarks demonstrate that our method significantly enhances the performance of various large language models of different sizes.
Abstract（参考訳）: ロングチェーン・オブ・ソート(LongCoT)機能は、複雑な推論タスクにおいて、大規模言語モデルによって達成された最近のブレークスルーの中心である。しかし、十分な探索をせずに頻繁に考えを切り替えることによって、モデルが浅い推論を示す「誤解」という課題は、性能とトークン効率の両方を制限している。この問題に対処するために,SmartSwitch推論フレームワークという,シンプルながら効果的な推論戦略を提案する。このフレームワークは、プラグイン・アンド・プレイのソリューションとして、どんな大きな言語モデルにも簡単に統合することができ、モデルの推論プロセスを継続的に監視し、過小評価を検出し、期待できないが見過ごされた考えのより深い探索へと導くことができます。具体的には、知覚モジュールは、既往の思考がスイッチする点を特定し、既往のプロセス報酬モデル(PRM)を用いて、前回の思考のポテンシャルを評価する。高能率思考が早期に放棄された場合、介入モジュールは進行中の推論を中断し、スイッチ前のポイントにバックトラックを挿入し、期待する経路に沿ってさらなる探索を促進するために「深みのあるプロンプト」を挿入する。問題となる数理推論ベンチマークの大規模な実験により,提案手法は様々な大規模言語モデルの性能を著しく向上させることが示された。

論文の概要: SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration

関連論文リスト