Fugu-MT 論文翻訳(概要): Cutting the Skip: Training Residual-Free Transformers

論文の概要: Cutting the Skip: Training Residual-Free Transformers

arxiv url: http://arxiv.org/abs/2510.00345v1
Date: Tue, 30 Sep 2025 23:07:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.289682
Title: Cutting the Skip: Training Residual-Free Transformers
Title（参考訳）: スキップを切る:残差なし変圧器の訓練
Authors: Yiping Ji, James Martens, Jianqiao Zheng, Ziqin Zhou, Peyman Moghadam, Xinyu Zhang, Hemanth Saratchandran, Simon Lucey,
Abstract要約: スキップ接続は表現の階層構造を妨害しますスキップが条件付けを改善する理由を示し、その安定化の利点が原則的戦略によって回復できることを明らかにする。標準アーキテクチャを変更することなく、スキップレス変圧器の安定かつ効率的な訓練を可能にする最初の方法を提案する。
参考スコア（独自算出の注目度）: 36.44084551425791
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Transformers have achieved remarkable success across a wide range of applications, a feat often attributed to their scalability. Yet training them without skip (residual) connections remains notoriously difficult. While skips stabilize optimization, they also disrupt the hierarchical structure of representations, raising the long-standing question of whether transformers can be trained efficiently without them. In this work, we address this problem by analyzing the Jacobian of a skipless transformer block, showing why skips improve conditioning and revealing that their stabilization benefits can be recovered through a principled initialization strategy. Building on this insight, we introduce the first method that enables stable and efficient training of skipless transformers without altering the standard architecture. We validate our approach on Vision Transformers (ViTs) in both supervised and self-supervised settings, demonstrating that skipless ViTs trained with our initialization overcome the usual optimization barriers, learn richer hierarchical representations, and outperform strong baselines, that incorporate skip connections, on dense prediction benchmarks. These results show that skip connections are not a fundamental requirement for training ViTs and open new avenues for hierarchical representation learning in vision models.
Abstract（参考訳）: トランスフォーマーは広範囲のアプリケーションで顕著な成功を収めた。しかし、スキップ(残留)接続を使わずにトレーニングすることは、いまだに難しい。スキップは最適化を安定させる一方で、表現の階層構造を破壊し、トランスフォーマーを効率的に訓練できるかどうかという長年の疑問を提起する。本研究では, スキップレス変圧器ブロックのジャコビアン解析を行い, なぜスキップが条件付けを改善するのかを示し, 基本初期化戦略によって安定化の利点を回復できることを明らかにする。この知見に基づいて、我々は、標準アーキテクチャを変更することなく、スキップレス変圧器の安定かつ効率的な訓練を可能にする最初の方法を紹介した。教師付きおよび自己教師型設定の両方において視覚変換器(ViT)のアプローチを検証し、初期化で訓練されたスキップレスViTが通常の最適化障壁を克服し、より階層的な表現を学習し、スキップ接続を組み込んだ強いベースラインを高密度予測ベンチマークで上回ることを示す。これらの結果から,スキップ接続は視覚モデルにおける階層的表現学習のためのViTの訓練や新たな道を開くための基本的な要件ではないことが示唆された。

論文の概要: Cutting the Skip: Training Residual-Free Transformers

関連論文リスト