Fugu-MT 論文翻訳(概要): Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

論文の概要: Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

arxiv url: http://arxiv.org/abs/2601.21894v1
Date: Thu, 29 Jan 2026 15:54:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.959146
Title: Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning
Title（参考訳）: すべてのコードが同じとは限らない - コードの複雑さとLLM推論に関するデータ中心の研究
Authors: Lukas Twist, Shu Yang, Hanqi Yan, Jingzhi Gong, Di Wang, Helen Yannakoudakis, Jie M. Zhang,
Abstract要約: 大きな言語モデル(LLM)は強い推論能力を持つようになり、しばしばチェーン・オブ・オブ・シンクスタイルの中間推論を生成する能力に起因している。最近の研究は、コードへの露出がこれらのスキルをさらに強化することを示しているが、既存の研究は、コードを一般的なトレーニング信号として扱うことが多い。本研究では、制御フローと構成構造をキャプチャーし、微調整中にモデルがマルチステップ推論を内部化する方法をモデル化するコードの構造的複雑さについて検討する。
参考スコア（独自算出の注目度）: 16.919028520729793
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning. Recent work suggests that exposure to code can further enhance these skills, but existing studies largely treat code as a generic training signal, leaving open the question of which properties of code actually contribute to improved reasoning. To address this gap, we study the structural complexity of code, which captures control flow and compositional structure that may shape how models internalise multi-step reasoning during fine-tuning. We examine two complementary settings: solution-driven complexity, where complexity varies across multiple solutions to the same problem, and problem-driven complexity, where complexity reflects variation in the underlying tasks. Using cyclomatic complexity and logical lines of code to construct controlled fine-tuning datasets, we evaluate a range of open-weight LLMs on diverse reasoning benchmarks. Our findings show that although code can improve reasoning, structural properties strongly determine its usefulness. In 83% of experiments, restricting fine-tuning data to a specific structural complexity range outperforms training on structurally diverse code, pointing to a data-centric path for improving reasoning beyond scaling.
Abstract（参考訳）: 大きな言語モデル(LLM)は強い推論能力を持つようになり、しばしばチェーン・オブ・オブ・シンクスタイルの中間推論を生成する能力に起因している。最近の研究は、コードへの露出がこれらのスキルをさらに強化することを示唆しているが、既存の研究は、コードの一般的なトレーニングシグナルとして主に扱い、コードのどの特性が推論の改善に実際に寄与しているかという疑問を解き放つ。このギャップに対処するために、制御フローと構成構造をキャプチャーし、微調整中にモデルがマルチステップ推論を内部化する方法を形作るコードの構造的複雑さについて検討する。ソリューション駆動型複雑性(英語版)は、同じ問題に対して複数のソリューションにまたがって複雑さが変化するが、問題駆動型複雑性(英語版)は、基礎となるタスクのバリエーションを反映する。制御された微調整データセットの構築には,サイクロマティックな複雑さと論理的なコード行を用いて,多様な推論ベンチマークを用いて,オープンウェイト LLM の範囲を評価した。その結果,コードは推論を改善することができるが,構造的特性は有用性を強く決定することがわかった。 83%の実験では、微調整データを特定の構造的な複雑さに制限することは、構造的に多様なコードのトレーニングよりも優れており、スケーリング以上の推論を改善するためのデータ中心の道を指し示している。

論文の概要: Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

関連論文リスト