Fugu-MT 論文翻訳(概要): Long Chain-of-Thought Reasoning Across Languages

論文の概要: Long Chain-of-Thought Reasoning Across Languages

arxiv url: http://arxiv.org/abs/2508.14828v1
Date: Wed, 20 Aug 2025 16:22:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-21 16:52:41.525066
Title: Long Chain-of-Thought Reasoning Across Languages
Title（参考訳）: 言語間の長鎖推論
Authors: Josh Barua, Seun Eisape, Kayo Yin, Alane Suhr,
Abstract要約: ロングチェーン・オブ・シークレット(CoTs)による推論のスケーリングにより、大規模言語モデル(LLMs)における印象的な推論機能が解放された。本研究では,2つの英文推論データセット,ファインチューン Qwen 2.5 (7B) および Qwen 3 (8B) モデルの翻訳版を構築し,フランス語,日本語,ラトビア語,スワヒリ語にまたがる長いCoT生成の体系的研究を行った。
参考スコア（独自算出の注目度）: 11.823604358250149
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling inference through long chains-of-thought (CoTs) has unlocked impressive reasoning capabilities in large language models (LLMs), yet the reasoning process remains almost exclusively English-centric. We construct translated versions of two popular English reasoning datasets, fine-tune Qwen 2.5 (7B) and Qwen 3 (8B) models, and present a systematic study of long CoT generation across French, Japanese, Latvian, and Swahili. Our experiments reveal three key findings. First, the efficacy of using English as a pivot language varies by language: it provides no benefit for French, improves performance when used as the reasoning language for Japanese and Latvian, and proves insufficient for Swahili where both task comprehension and reasoning remain poor. Second, extensive multilingual pretraining in Qwen 3 narrows but does not eliminate the cross-lingual performance gap. A lightweight fine-tune using only 1k traces still improves performance by over 30\% in Swahili. Third, data quality versus scale trade-offs are language dependent: small, carefully curated datasets suffice for English and French, whereas larger but noisier corpora prove more effective for Swahili and Latvian. Together, these results clarify when and why long CoTs transfer across languages and provide translated datasets to foster equitable multilingual reasoning research.
Abstract（参考訳）: ロングチェーン・オブ・シークレット(CoTs)による推論をスケールすることで、大きな言語モデル(LLMs)における印象的な推論能力が解放されたが、推論プロセスはほぼ英語中心のままである。本研究では,2つの英文推論データセット,ファインチューン Qwen 2.5 (7B) および Qwen 3 (8B) モデルの翻訳版を構築し,フランス語,日本語,ラトビア語,スワヒリ語にまたがる長いCoT生成の体系的研究を行った。私たちの実験では3つの重要な発見が明らかになった。第一に、英語をピボット言語として使う効果は言語によって異なり、フランス語の利点が得られず、日本語とラトビア語の推論言語として使用する場合のパフォーマンスが向上し、タスク理解と推論の両方が貧弱なスワヒリでは不十分であることが証明される。第2に、Qwen 3の狭義の多言語事前学習は、多言語間性能ギャップを排除しない。 1kトレースしか使用していない軽量のファインチューンは、スワヒリにおけるパフォーマンスを30%以上改善している。第3に、データ品質とスケールトレードオフは言語に依存しており、スワヒリ語とラトビア語ではより大きいがノイズの多いコーパスの方が有効である。これらの結果から,CoTsが言語間でいつ,なぜ長くなるのかを明らかにするとともに,同種の多言語推論研究を育成するための翻訳データセットを提供する。

論文の概要: Long Chain-of-Thought Reasoning Across Languages

関連論文リスト