Fugu-MT 論文翻訳(概要): Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning

論文の概要: Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning

arxiv url: http://arxiv.org/abs/2508.10019v1
Date: Thu, 07 Aug 2025 01:13:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 22:24:48.013209
Title: Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning
Title（参考訳）: 小規模モデル推論のための問題空間マッピングによる推論からの解離
Authors: Li Wang, Changhao Zhang, Zengqi Xiu, Kai Lu, Xin Yu, Kui Zhang, Wenjun Wu,
Abstract要約: 本稿では、自然言語問題を標準問題空間にマッピングすることで、推論から理解を分離する新しいフレームワークを提案する。本フレームワークでは, 自己蒸留による推論軌道を反復的に整列する3段階のアルゴリズムであるDURITを導入する。実験により、DURITはドメイン内およびドメイン外の数学的および論理的推論タスクにおけるSLMの性能を大幅に改善することが示された。
参考スコア（独自算出の注目度）: 22.582715282848795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., $\leq$ 1.5B) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.
Abstract（参考訳）: LLM(Large Language Models)の推論能力の最近の進歩にもかかわらず、SLM(Small Language Models)の推論能力の改善(例えば、$\leq$ 1.5B)は依然として困難である。自然言語の複雑さと可変性には大きな障害がある:本質的には、様々な表面形式に等価な問題がしばしば現れ、しばしば冗長性や注意をそらす詳細によって隠蔽される。それらはまず、複雑な言語入力からコア問題を抽出し、その理解に基づいて推論を行う必要がある。結果として生じる巨大でノイズの多い問題空間は、特に限られた容量を持つモデルに対する最適化を妨げる。そこで本稿では,自然言語問題を正規問題空間にマッピングすることで,推論から理解を分離するフレームワークを提案する。これにより、SLMは言語的多様性のない標準化された入力に対する推論に集中することができる。本枠組みでは,(1)強化学習による自然言語問題のマッピング,(2)自己蒸留による推論軌道の整列,(3)問題空間における推論ポリシーの導出という3段階のアルゴリズムであるDURIT(Decoupled Understanding from Reasoning via Iterative Training)を導入する。マッパーと推論子は、このプロセスを通して交互に学習される。実験により、DURITはドメイン内およびドメイン外の数学的および論理的推論タスクにおけるSLMの性能を大幅に改善することが示された。推論能力の改善に加えて、DURITは推論の堅牢性を改善し、SLMを強化する効果的な戦略として推論から分離された理解を検証する。

論文の概要: Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning

関連論文リスト