Fugu-MT 論文翻訳(概要): More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

論文の概要: More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

arxiv url: http://arxiv.org/abs/2605.06672v1
Date: Tue, 21 Apr 2026 04:14:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 12:34:33.664283
Title: More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models
Title（参考訳）: より思考し、よりバイアス: 推論モデルにおける長さ駆動的な位置バイアス
Authors: Xiao Wang,
Abstract要約: チェーン・オブ・シント(CoT)推論と推論調整モデルは通常、慎重に考えることで浅いバイアスを減らすと仮定される。我々は、複数の選択QAにおける位置バイアスでこれを検証し、異なるストーリーを見つける:任意の推論能力モデルにおいて、探究位置バイアスは、推論軌跡の長さとともにスケールする。
参考スコア（独自算出の注目度）: 5.705685936981751
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-capable model, per-question position bias scales with the length of the reasoning trajectory. Across thirteen reasoning-mode configurations (two R1-distilled 7-8B models, two base models prompted with CoT, and DeepSeek-R1 at 671B) on MMLU, ARC-Challenge, and GPQA, twelve show a positive partial correlation between trajectory length and Position Bias Score (PBS) after controlling for accuracy, ranging from 0.11 to 0.41 (all p < 0.05). All twelve open-weight reasoning-mode configurations show monotonically increasing PBS across length quartiles. A truncation intervention provides causal evidence: continuations resumed from later points in the trajectory are increasingly likely to shift toward position-preferred options (16% to 32% for R1-Qwen-7B across absolute-position buckets). At 671B, aggregate PBS collapses to 0.019, but the length effect still manifests in the longest quartile (PBS = 0.071), suggesting that accuracy gates the expression of length-driven bias rather than eliminating the underlying mechanism. We additionally find that direct-answer position bias is a distinct phenomenon with a different footprint (strong in Llama-Instruct-direct, weak in Qwen-Instruct-direct, and uncorrelated with trajectory length): CoT reasoning replaces this baseline bias with length-accumulated bias. Our results argue that reasoning-capable models should not be treated as order-robust by default in MCQ evaluation pipelines, and offer a diagnostic toolkit (PBS, commitment change point, effective switching, truncation probes) for auditing position bias in reasoning models.
Abstract（参考訳）: CoT(Chain-of- Thought)推論やDeepSeek-R1のような推論調整モデルは通常、慎重に考えることで浅いヒューリスティックバイアスを減らすと仮定される。我々は、複数の選択QAにおける位置バイアスでこれを検証し、異なるストーリーを見つける:任意の推論能力モデルにおいて、探究位置バイアスは、推論軌跡の長さとともにスケールする。 MMLU、ARC-Challenge、GPQAの13の推理モード構成(2つのR1蒸留7-8Bモデル、CoTによる2つのベースモデル、671BでのDeepSeek-R1)は、精度を0.11から0.41(全てp < 0.05)まで制御した後、軌道長と位置バイアススコア(PBS)の正の偏相関を示す。 12個のオープンウェイトな推論モード構成は全て、長さの四量体でPBSが単調に増加することを示している。軌道上の後続点から再開された継続は、位置優先の選択肢(絶対位置のバケットにまたがるR1-Qwen-7Bでは16%から32%)にシフトする傾向にある。 671Bでは、集合PBSは0.019に崩壊するが、長方晶(PBS = 0.071)では依然として長さ効果が示され、基礎となるメカニズムをなくすのではなく、長さ駆動バイアスの表現を正確にゲートすることが示唆された。さらに, 直接解答位置バイアスは, 異なるフットプリント(Llama-Instruct-direct, weak in Qwen-Instruct-direct, uncorrelated with trajectory length): CoT reasoningは, このベースラインバイアスを長さ累積バイアスに置き換える。本研究は,MCQ評価パイプラインにおいて,推論可能なモデルはデフォルトではオーダーローバストとして扱われるべきではなく,推論モデルにおける位置バイアスを監査するための診断ツールキット(PBS,コミットメント変更点,効率的な切替,トランケーションプローブ)を提供することを論じる。

論文の概要: More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

関連論文リスト