Fugu-MT 論文翻訳(概要): Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

論文の概要: Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

arxiv url: http://arxiv.org/abs/2605.07307v1
Date: Fri, 08 May 2026 06:15:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.850045
Title: Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts
Title（参考訳）: 難易度連鎖を再考する: 言語モデルの推論は、スパース、オーダーシャッフルのチェーンから回答を抽出できる
Authors: Yi-Chang Chen, Feng-Ting Liao, Da-shan Shiu, Hung-yi Lee,
Abstract要約: 現代の推論言語モデルは、すべてのトークンが寄与し、ステップを順番に消費しなければならないと暗黙的に仮定して、シーケンシャルな連鎖トレースを生成する。我々は、モデル生成推論連鎖に適用した、系統的な介入パイプライン、除去、マスキング、シャッフル、ノイズ注入により、両方の仮定に挑戦する。解答抽出は, スパース, 秩序不感, 構造的に堅牢な情報基板上で行う。
参考スコア（独自算出の注目度）: 51.84894623128418
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern reasoning language models generate dense, sequential chain-of-thought traces implicitly assuming that every token contributes and that steps must be consumed in order. We challenge both assumptions through a systematic intervention pipeline--removal, masking, shuffling, and noise injection--applied to model-generated reasoning chains across three models and three benchmarks. Our findings are counterintuitive on three dimensions. Order: Does the sequential order of a reasoning chain matter for answer extraction? No--line-level shuffling reduces accuracy by less than 0.5 pp; word-level shuffling retains 62%-89% accuracy; only token-level shuffling collapses to near zero. Pretrained-only and instruction-tuned variants exhibit near-identical tolerance (78.67% vs. 78.00% under line shuffling), indicating order-independence originates from pretraining rather than reasoning-specific fine-tuning. Dense: Is all the information in a reasoning chain important for answer extraction? No--masking numeric digits collapses accuracy to exactly 0%, while masking alphabetic prose improves accuracy by 4.7 pp. Robustness: Is a reasoning chain that is both order-shuffling and non-dense still robust? Yes--the most aggressively reduced representation (all natural language removed, lines arbitrarily shuffled) still achieves 83% accuracy, and injecting false answers at 3x true-answer frequency leaves accuracy unchanged (83.3%->83.3%), falsifying a frequency-based extraction account. These results establish that answer extraction operates on a sparse, order-insensitive, and structurally robust informational substrate, opening paths toward parallelized and token-efficient reasoning generation.
Abstract（参考訳）: 現代の推論言語モデルは、すべてのトークンが寄与し、ステップを順番に消費しなければならないと暗黙的に仮定して、シーケンシャルな連鎖トレースを生成する。 3つのモデルと3つのベンチマークにまたがるモデル生成推論チェーンに応用した、除去、マスキング、シャッフル、ノイズ注入という系統的な介入パイプラインを通じて、両方の仮定に挑戦する。私たちの発見は3次元では直感的ではない。順序: 答え抽出のための推論連鎖の逐次順序は重要か? No-line-level shuffling は 0.5 pp 未満の精度で精度を低下させ、ワードレベル shuffling は 62%-89% の精度を維持し、トークンレベル shuffling のみがほぼ 0 に崩壊する。訓練済みと訓練済みの変種は、ほぼ同一の耐性を示す(ラインシャッフルでは78.67%対78.00%)。 Dense: 答えの抽出には,すべての情報が必要なのでしょうか? no-masking 数値桁は精度を正確に0%に低下させ、アルファベットの散文をマスキングすると精度が4.7 pp 向上する。ロバストネス: 注文シャッフルとナンセンスの両方の推論チェーンは、依然として堅牢なのでしょうか? もっとも積極的な表現(全ての自然言語が削除され、任意にシャッフルされた線)は83%の精度を保ち、3倍の真答えの周波数の葉に偽の答えを注入し(83.3%->83.3%)、周波数ベースの抽出アカウントを偽造する。これらの結果は、解答抽出がスパース、秩序不感、構造的に堅牢な情報基板上で動作し、並列化およびトークン効率の推論生成への経路を開くことを証明している。

論文の概要: Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

関連論文リスト