Fugu-MT 論文翻訳(概要): On the Step Length Confounding in LLM Reasoning Data Selection

論文の概要: On the Step Length Confounding in LLM Reasoning Data Selection

arxiv url: http://arxiv.org/abs/2604.06834v1
Date: Wed, 08 Apr 2026 08:51:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.438341
Title: On the Step Length Confounding in LLM Reasoning Data Selection
Title（参考訳）: LLM推論データ選択におけるステップ長制約について
Authors: Bing Wang, Rui Miao, Chen Shen, Shaotian Yan, Kaiyuan Liu, Ximing Li, Xiaosong Yuan, Sinan Fan, Jun Zhang, Jieping Ye,
Abstract要約: 大規模言語モデルは、高品質のものよりも長い推論ステップのサンプルを好むことを示す。この現象はステップ長共起(step length confounding)として知られている。この問題を緩和する2つの方法を提案する。
参考スコア（独自算出の注目度）: 46.02555419476045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large reasoning models have recently demonstrated strong performance on complex tasks that require long chain-of-thought reasoning, through supervised fine-tuning on large-scale and high-quality datasets. To construct such datasets, existing pipelines generate long reasoning data from more capable Large Language Models (LLMs) and apply manually heuristic or naturalness-based selection methods to filter high-quality samples. Despite the proven effectiveness of naturalness-based data selection, which ranks data by the average log probability assigned by LLMs, our analysis shows that, when applied to LLM reasoning datasets, it systematically prefers samples with longer reasoning steps (i.e., more tokens per step) rather than higher-quality ones, a phenomenon we term step length confounding. Through quantitative analysis, we attribute this phenomenon to low-probability first tokens in reasoning steps; longer steps dilute their influence, thereby inflating the average log probabilities. To address this issue, we propose two variant methods: ASLEC-DROP, which drops first-token probabilities when computing average log probability, and ASLEC-CASL, which applies a causal debiasing regression to remove the first tokens' confounding effect. Experiments across four LLMs and five evaluation benchmarks demonstrate the effectiveness of our approach in mitigating the step length confounding problem.
Abstract（参考訳）: 大規模な推論モデルは、大規模で高品質なデータセットの教師付き微調整を通じて、長いチェーンの推論を必要とする複雑なタスクに強いパフォーマンスを示してきた。このようなデータセットを構築するために、既存のパイプラインはより有能なLarge Language Models(LLM)から長い推論データを生成し、手動でヒューリスティックまたは自然性に基づく選択手法を適用して高品質なサンプルをフィルタリングする。 LLMが割り当てた平均ログ確率でデータをランク付けする自然性に基づくデータ選択の有効性が証明されているにもかかわらず、LLM推論データセットに適用すると、より長い推論ステップ(つまり、ステップ当たりのトークン数の増加)のサンプルを体系的に選好する。定量的解析により、この現象を推論ステップにおける低確率第一トークンとみなし、より長いステップで影響を減らし、平均ログ確率を膨らませる。この問題に対処するため,平均ログ確率を計算する際,第1の確率を低下させるASLEC-DROPと,第1のトークンの共起効果を除去するために因果デバイアスレグレッションを適用するASLEC-CASLの2つの方法を提案する。 4つのLCMおよび5つの評価ベンチマーク実験により,ステップ長共起問題を緩和する手法の有効性が示された。

論文の概要: On the Step Length Confounding in LLM Reasoning Data Selection

関連論文リスト