Fugu-MT 論文翻訳(概要): On Code-Induced Reasoning in LLMs

論文の概要: On Code-Induced Reasoning in LLMs

arxiv url: http://arxiv.org/abs/2509.21499v2
Date: Thu, 02 Oct 2025 16:45:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 14:32:17.120773
Title: On Code-Induced Reasoning in LLMs
Title（参考訳）: LLMにおけるコード誘起推論について
Authors: Abdul Waheed, Zhen Wu, Carolyn Rosé, Daphne Ippolito,
Abstract要約: 並列命令データセットを10のプログラミング言語で構築する。コードの構造的・意味的特性を選択的に破壊する制御摂動を適用する。以上の結果から,LLMは意味論的よりも構造的摂動に弱いことが示唆された。
参考スコア（独自算出の注目度）: 21.875805779552564
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code data has been shown to enhance the reasoning capabilities of large language models (LLMs), but it remains unclear which aspects of code are most responsible. We investigate this question with a systematic, data-centric framework. We construct parallel instruction datasets in ten programming languages and apply controlled perturbations that selectively disrupt structural or semantic properties of code. We then finetune LLMs from five model families and eight scales on each variant and evaluate their performance on natural language, math, and code tasks. Across 3,331 experiments, our results show that LLMs are more vulnerable to structural perturbations than semantic ones, particularly on math and code tasks. Appropriate abstractions like pseudocode and flowcharts can be as effective as code, while encoding the same information with fewer tokens without adhering to original syntax can often retain or even improve performance. Remarkably, even corrupted code with misleading signals remains competitive when surface-level regularities persist. Finally, syntactic styles also shape task-specific gains with Python favoring natural language reasoning and lower-level languages such as Java and Rust favoring math. Through our systematic framework, we aim to provide insight into how different properties of code influence reasoning and inform the design of training data for enhancing LLM reasoning capabilities.
Abstract（参考訳）: コードデータは、大きな言語モデル(LLM)の推論能力を高めることが示されているが、コードのどの側面が最も責任を負っているのかは不明だ。我々はこの問題を、体系的なデータ中心のフレームワークで調査する。並列命令データセットを10のプログラミング言語で構築し,コードの構造的・意味的特性を選択的に破壊する制御摂動を適用した。次に,5つのモデルファミリーと8つのスケールからLLMを抽出し,自然言語,数学,コードタスクの性能を評価する。 3,331件の実験結果から,LLMは意味論よりも構造的摂動に弱いことが示唆された。擬似コードやフローチャートのような適切な抽象化は、コードと同じくらい効果的であるが、オリジナルの構文に固執することなく、トークンが少ない同じ情報をエンコードすることは、しばしばパフォーマンスを維持または改善する。注目すべきは、表面レベルの規則性が持続するときに、誤解を招くシグナルを持つ破損したコードでさえ、競争力を維持することだ。最後に、構文スタイルは、Pythonが自然言語の推論を好んだり、JavaやRustが数学を好んだりするなど、タスク固有のゲインを形作る。体系的なフレームワークを通じて、コードの異なる特性が推論にどのように影響するかを洞察し、LLM推論能力を高めるためのトレーニングデータの設計を通知することを目的としている。

論文の概要: On Code-Induced Reasoning in LLMs

関連論文リスト