Fugu-MT 論文翻訳(概要): Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

論文の概要: Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

arxiv url: http://arxiv.org/abs/2603.11749v1
Date: Thu, 12 Mar 2026 09:52:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.005684
Title: Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information
Title（参考訳）: Compression Favors Consistency, not Truth: When and why Language Models Preferrect Information
Authors: Konstantin Krestnikov,
Abstract要約: 混合品質データを用いて訓練しても、言語モデルが正しい文を好むことがある理由を考察する。真実バイアスは、誤った代替品が構造的に圧縮しにくい場合にのみ現れる。以上の結果から,「真実バイアス」として現れるものは,圧縮圧力と内部整合性の嗜好の副作用であることが示唆された。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data. Truth bias emerges only when false alternatives are structurally harder to compress. We test this using small GPT-2-style character-level transformers (3.5M--86M parameters) on synthetic math corpora with controlled mixtures of correct and incorrect rules. In the random-error setting, models strongly prefer correct completions in paired evaluation: 83.1% accuracy at balanced data and 67.0% even when correct rules appear in only 10% of the corpus. Replacing random errors with a coherent but mathematically incorrect rule system largely eliminates the preference (near-chance accuracy). In a more natural-language-like synthetic world, the effect is weaker but still present (57.7%). Additional experiments show that embedding verification steps can restore preference for correctness even at small scale, while increasing the number of consistent rules produces a graded improvement in accuracy. Our results suggest that what appears as a "truth bias" is largely a side effect of compression pressure and preference for internal consistency, rather than an intrinsic drive toward truth. Full code and data are available at https://github.com/Rai220/compression-drives-truth.
Abstract（参考訳）: 混合品質のデータでトレーニングしても、なぜ言語モデルは正しいステートメントを好むのか? 圧縮-一貫性原理(Compression-Consistency Principle: next-token prediction)は、トレーニングデータのより短く、より内部的に一貫した記述を可能にする仮説を支持する。真実バイアスは、偽の代替品が構造的に圧縮が困難である場合にのみ現れる。合成数学コーパスにおけるGPT-2スタイルの文字レベル変換器(3.5M--86Mパラメータ)を用いて、正しい規則と間違った規則の混合を制御してこれを検証した。ランダムエラー設定では、モデルがペア評価における正しい完了を強く推奨する: バランスデータにおける83.1%の精度と、コーパスの10%に正しい規則が現れる場合でも67.0%である。コヒーレントだが数学的に正しくない規則システムでランダムエラーをリプレースすることは、好み(近精度)を大幅に排除する。より自然言語的な合成の世界では、効果は弱いが57.7%である。さらなる実験により、埋め込み検証ステップは小規模でも精度を回復できる一方で、一貫したルールの数を増やすことで精度が格段に向上することが示された。以上の結果から,「真実バイアス」として現れるものは,本質的な真理への推進ではなく,圧縮圧力と内部整合性の優先による副作用であることが示唆された。完全なコードとデータはhttps://github.com/Rai220/compression-drives-truthで公開されている。

論文の概要: Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

関連論文リスト