Fugu-MT 論文翻訳(概要): Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

論文の概要: Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

arxiv url: http://arxiv.org/abs/2604.12214v1
Date: Tue, 14 Apr 2026 02:48:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.202463
Title: Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code
Title（参考訳）: LLM4CodeのCoTロバスト性について
Authors: Yang Liu, Da Song, Armstrong Foundjem, Heng Li, Foutse Khomh,
Abstract要約: Chain-of-Thought (CoT) プロンプトは、コードのための大きな言語モデル (LLM4Code) から明示的な推論を引き出すために広く使われている。我々は、CoTが内部の不確実性ダイナミクスをどのように再認識し、なぜコード生成を助けるのではなく、時に悪影響を及ぼすのかを研究する。
参考スコア（独自算出の注目度）: 13.598118096561775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) prompting is widely used to elicit explicit reasoning from large language models for code (LLM4Code). However, its impact on robustness and the stability of reasoning trajectories under realistic input perturbations remains poorly understood. Prior work has largely evaluated CoT through final correctness, leaving a critical gap in understanding how CoT reshapes internal uncertainty dynamics and why it sometimes harms rather than helps code generation. We suggest that CoT is not uniformly beneficial; instead, its robustness depends on whether perturbations destabilize structurally sensitive commitment points along the reasoning-to-code trajectory. We conduct a controlled, large-scale empirical study of CoT across six models and two code benchmarks (MHPP and BigCodeBench), subjecting task docstrings to systematic character-, word-, and sentence-level perturbations. We instrument full generation traces with token-level uncertainty and define three novel structural anchors: reasoning-code transition, symbolic commitment, and algorithmic articulation. Findings: (1) CoT does not yield uniform performance or robustness gains: its benefits are contingent on model family, task structure, and prompt explicitness. (2) CoT and No-CoT exhibit distinct robustness profiles, with different perturbation families triggering different failure modes. (3) We identify three recurrent trajectory deformations--Lengthening, Branching, and Simplification--that systematically emerge when perturbations interact with structural anchors and explain failure patterns. (4) Early-stage uncertainty serves as a reliable diagnostic signal for localizing where trajectory instability begins around sensitive anchors. These results provide a unified explanation for CoT's mixed performance and suggest design principles for building more robust reasoning-based code generators.
Abstract（参考訳）: Chain-of-Thought (CoT) プロンプトは、コードのための大きな言語モデル (LLM4Code) から明示的な推論を引き出すために広く使われている。しかし、現実的な入力摂動下でのロバスト性や推論軌道の安定性への影響はよく分かっていない。これまでの作業はCoTを最終的な正確性を通じて大きく評価しており、CoTが内部の不確実性ダイナミクスをどのように再評価するか、なぜコード生成に役立つのかを理解する上で、重大なギャップを残しています。代わりに、その頑健性は、乱れが構造に敏感なコミットメントポイントを、推論からコードへの軌道に沿って不安定化するかどうかに依存する。我々は、6つのモデルと2つのコードベンチマーク(MHPPとBigCodeBench)にまたがるCoTの大規模かつ大規模な実証研究を行い、タスクドクストリングを系統的な文字・単語・文レベルの摂動に従わせる。トークンレベルの不確実性を持つフルジェネレーショントレースを計測し、推論コード遷移、シンボリックコミットメント、アルゴリズム記述という3つの新しい構造アンカーを定義する。結果:(1)CoTは、モデルファミリ、タスク構造、迅速な明示性に基づいて、均一なパフォーマンスやロバスト性を得ることができない。 2) CoT と No-CoT は、異なる摂動系が異なる障害モードをトリガーする、異なる堅牢性プロファイルを示す。 (3) 摂動が構造的アンカーと相互作用し, 故障パターンを説明する際に, 系統的に発生する3つの繰り返し軌道変形--Lengthening, Branching, Simplification-を同定する。 (4) 早期不確実性は, センシティブアンカーの周囲の軌道不安定度を同定するための信頼性の高い診断信号として機能する。これらの結果は、CoTの混合性能を統一的に説明し、より堅牢な推論ベースのコードジェネレータを構築するための設計原則を提案する。

論文の概要: Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

関連論文リスト