Fugu-MT 論文翻訳(概要): Directional Optimization Asymmetry in Transformers: A Synthetic Stress Test

論文の概要: Directional Optimization Asymmetry in Transformers: A Synthetic Stress Test

arxiv url: http://arxiv.org/abs/2511.19997v1
Date: Tue, 25 Nov 2025 07:03:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-26 17:37:04.32315
Title: Directional Optimization Asymmetry in Transformers: A Synthetic Stress Test
Title（参考訳）: 変圧器の方向性最適化非対称性:合成応力試験
Authors: Mihir Sahasrabudhe,
Abstract要約: 変換子は理論的には逆不変であり、その関数クラスは右から左への写像よりも左から右への写像を好まない。 LLMの時間的非対称性に関する最近の研究は、実世界のコーパスが独自の時間的矢印を持っていることを示唆している。方向性の障害は、言語統計学によるものなのか、あるいはアーキテクチャ自体によるものなのか?
参考スコア（独自算出の注目度）: 0.15229257192293197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers are theoretically reversal-invariant: their function class does not prefer left-to-right over right-to-left mappings. Yet empirical studies on natural language repeatedly report a "reversal curse," and recent work on temporal asymmetry in LLMs suggests that real-world corpora carry their own arrow of time. This leaves an unresolved question: do directional failures stem from linguistic statistics, or from the architecture itself? We cut through this ambiguity with a fully synthetic, entropy-controlled benchmark designed as a clean-room stress test for directional learning. Using random string mappings with tunable branching factor K, we construct forward tasks with zero conditional entropy and inverse tasks with analytically determined entropy floors. Excess loss above these floors reveals that even scratch-trained GPT-2 models exhibit a strong, reproducible directional optimization gap (e.g., 1.16 nats at K=5), far larger than that of an MLP trained on the same data. Pre-trained initializations shift optimization behavior but do not eliminate this gap, while LoRA encounters a sharp capacity wall on high-entropy inverse mappings. Together, these results isolate a minimal, semantics-free signature of directional friction intrinsic to causal Transformer training-one that persists even when linguistic priors, token frequencies, and corpus-level temporal asymmetries are removed. Our benchmark provides a controlled instrument for dissecting directional biases in modern sequence models and motivates deeper mechanistic study of why inversion remains fundamentally harder for Transformers.
Abstract（参考訳）: 変換子は理論的には逆不変であり、その関数クラスは右から左への写像よりも左から右への写像を好まない。しかし、自然言語に関する実証的研究は繰り返し「逆の呪い」を報告し、LLMにおける時間的非対称性に関する最近の研究は、現実世界のコーパスが独自の時間的矢印を持っていることを示唆している。方向性の障害は、言語統計学によるものなのか、あるいはアーキテクチャ自体によるものなのか? 方向性学習のためのクリーンルームストレステストとして設計された完全合成エントロピー制御ベンチマークで、この曖昧さを克服した。可変分岐係数Kのランダムな文字列写像を用いて、解析的に決定されたエントロピーフロアの条件付きエントロピーゼロの前方タスクと逆タスクを構築する。これらのフロア上の余分な損失は、スクラッチトレーニングされたGPT-2モデルでさえ、同じデータでトレーニングされたMLPよりもはるかに大きく、再現可能な方向最適化のギャップ(例えば、K=5で1.16ナット)が強いことを示している。事前訓練された初期化は最適化の挙動をシフトさせるが、このギャップを排除しない。これらの結果は,言語的先行性,トークン頻度,コーパスレベルの時間的非対称性を除去しても持続する因果的トランスフォーマー訓練に固有の,最小限の無意味な指向性摩擦のシグネチャを分離する。我々のベンチマークは、現代のシーケンスモデルにおいて方向バイアスを分離するための制御された手段を提供し、なぜ変換器にとって逆転が根本的に難しいのかについてのより深い力学研究を動機付けている。

論文の概要: Directional Optimization Asymmetry in Transformers: A Synthetic Stress Test

関連論文リスト