Fugu-MT 論文翻訳(概要): LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

論文の概要: LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

arxiv url: http://arxiv.org/abs/2605.22567v1
Date: Thu, 21 May 2026 14:47:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 20:14:18.587882
Title: LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
Title（参考訳）: LANG: 言語適応型ヒントガイダンスを用いた多言語推論のための強化学習
Authors: Yuchun Fan, Bei Li, Peiguang Li, Yilin Wang, Yongyu Mu, Jian Yang, Xin Chen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Jingbo Zhu, Tong Xiao,
Abstract要約: 強化学習は大規模言語モデルにおける多段階推論の強化に有効であることが証明されている。しかし、その利点は多言語文脈に完全には翻訳されていない。我々は、言語条件付きヒントを利用して、英語以外の推論タスクの探索をガイドする新しいフレームワークを開発する。
参考スコア（独自算出の注目度）: 77.58408743830314
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning has proven effective for enhancing multi-step reasoning in large language models (LLMs), yet its benefits have not fully translated to multilingual contexts. Existing methods struggle with a fundamental trade-off: prioritizing input-language consistency severely hampers reasoning quality, while prioritizing reasoning often leads to unintended language drift toward English. We address this challenge with LANG, a novel framework that leverages language-conditioned hints to guide exploration in non-English reasoning tasks. Our method incorporates two key mechanisms to prevent dependency on these hints: a progressive decay schedule that gradually withdraws scaffolding, and a language-adaptive switch that tailors learning horizons to specific language difficulties. Empirical results on challenging multilingual mathematical benchmarks reveal that LANG substantially enhances reasoning performance without compromising language consistency. Moreover, we show that our framework generalizes beyond mathematics, fostering more consistent language alignment across model layers
Abstract（参考訳）: 強化学習は大規模言語モデル(LLM)における多段階推論の強化に有効であることが証明されているが、その利点は多言語文脈に完全に翻訳されていない。入力言語の一貫性の優先順位付けは、品質の理由付けを厳しく妨げる一方で、推論の優先順位付けは、意図しない言語が英語に向かって漂うことにつながる。言語条件付きヒントを利用して、英語以外の推論タスクの探索をガイドする新しいフレームワークであるLANGで、この問題に対処する。提案手法では,これらのヒントに依存しないための2つの重要なメカニズムが組み込まれており,段階的に足場を離脱する進行的減衰スケジュールと,特定の言語難易度を学習する言語適応スイッチが組み込まれている。多言語数学ベンチマークの試行の結果から,LANGは言語の一貫性を損なうことなく推論性能を大幅に向上することがわかった。さらに、我々のフレームワークは数学を超えて一般化し、モデル層間のより一貫性のある言語アライメントを促進することを示す。

論文の概要: LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

関連論文リスト