Fugu-MT 論文翻訳(概要): Language as a Latent Variable for Reasoning Optimization

論文の概要: Language as a Latent Variable for Reasoning Optimization

arxiv url: http://arxiv.org/abs/2604.21593v1
Date: Thu, 23 Apr 2026 12:19:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.483604
Title: Language as a Latent Variable for Reasoning Optimization
Title（参考訳）: 推論最適化のための潜在変数としての言語
Authors: Linjuan Wu, Haoran Wei, Jialong Tang, Shuang Luo, Baosong Yang, Yongliang Shen, Weiming Lu,
Abstract要約: LLMは英語中心のバイアスを減らすので、驚くべき傾向が現れます。モデルの内部推論経路を構造的に修飾する潜在変数として機能する言語を仮定する。言語変化を暗黙的な探索信号として扱うRLフレームワークであるpolyGRPOを提案する。
参考スコア（独自算出の注目度）: 45.35129925776798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As LLMs reduce English-centric bias, a surprising trend emerges: non-English responses sometimes outperform English on reasoning tasks. We hypothesize that language functions as a latent variable that structurally modulates the model's internal inference pathways, rather than merely serving as an output medium. To test this, we conducted a Polyglot Thinking Experiment, in which models were prompted to solve identical problems under language-constrained and language-unconstrained conditions. Results show that non-English responses often achieve higher accuracy, and the best performance frequently occur when language is unconstrained, suggesting that multilinguality broadens the model's latent reasoning space. Based on this insight, we propose polyGRPO (Polyglot Group Relative Policy Optimization), an RL framework that treats language variation as an implicit exploration signal. It generates polyglot preference data online under language-constrained and unconstrained conditions, optimizing the policy with respect to both answer accuracy and reasoning structure. Trained on only 18.1K multilingual math problems without chain-of-thought annotations, polyGRPO improves the base model (Qwen2.5-7B-Instruct) by 6.72% absolute accuracy on four English reasoning testset and 6.89% in their multilingual benchmark. Remarkably, it is the only method that surpasses the base LLM on English commonsense reasoning task (4.9%), despite being trained solely on math data-highlighting its strong cross-task generalization. Further analysis reveals that treating language as a latent variable expands the model's latent reasoning space, yielding consistent and generalizable improvements in reasoning performance.
Abstract（参考訳）: LLMは英語中心のバイアスを減らすので、驚くべき傾向が現れます。我々は、単に出力媒体として機能するのではなく、モデルの内部推論経路を構造的に修飾する潜在変数としての言語機能について仮説を立てる。これをテストするために,多言語思考実験を行い,言語制約と言語制約の条件下での同一の問題をモデルで解いた。その結果、非英語応答は高い精度を達成し、言語が制約されない場合に最高の性能がしばしば発生することが示され、多言語性はモデルの潜在推論空間を広げることが示された。この知見に基づいて,言語変化を暗黙的な探索信号として扱うRLフレームワークであるpolyGRPO(Polyglot Group Relative Policy Optimization)を提案する。言語制約や制約のない条件下で、オンラインでポリグロット嗜好データを生成し、回答精度と推論構造の両方に関してポリシーを最適化する。チェイン・オブ・シンクトのアノテーションを使わずに18.1Kの多言語数学の問題を訓練し、PolyGRPOはベースモデル(Qwen2.5-7B-Instruct)を4つの英語の推論テストセットで6.72%、マルチ言語ベンチマークで6.89%精度で改善した。注目すべきは、これは英語のコモンセンス推論タスク(4.9%)の基本的なLLMを超える唯一の方法である。さらに解析により、言語を潜在変数として扱うことにより、モデルの潜在推論空間が拡張され、推論性能が一貫した一般化可能な改善がもたらされることが明らかとなった。

論文の概要: Language as a Latent Variable for Reasoning Optimization

関連論文リスト