Fugu-MT 論文翻訳(概要): Sinkhorn doubly stochastic attention rank decay analysis

論文の概要: Sinkhorn doubly stochastic attention rank decay analysis

arxiv url: http://arxiv.org/abs/2604.07925v1
Date: Thu, 09 Apr 2026 07:46:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.774932
Title: Sinkhorn doubly stochastic attention rank decay analysis
Title（参考訳）: シンクホーン2倍の確率的注意ランク減衰解析
Authors: Michela Lapenna, Rita Fioresi, Bahman Gharesifard,
Abstract要約: Sinkhornアルゴリズムで正規化された2重の注意が、標準のSoftmax行確率よりも効果的にランクを保っていることを示す。シンクホーン正規化(英語版)を用いるとき、純粋自己アテンションランクの減衰の理論的境界を導出し、そのランクが深さと指数関数的に2つに崩壊することを発見する。
参考スコア（独自算出の注目度）: 1.376408511310322
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The self-attention mechanism is central to the success of Transformer architectures. However, standard row-stochastic attention has been shown to suffer from significant signal degradation across layers. In particular, it can induce rank collapse, resulting in increasingly uniform token representations, as well as entropy collapse, characterized by highly concentrated attention distributions. Recent work has highlighted the benefits of doubly stochastic attention as a form of entropy regularization, promoting a more balanced attention distribution and leading to improved empirical performance. In this paper, we study rank collapse across network depth and show that doubly stochastic attention matrices normalized with Sinkhorn algorithm preserve rank more effectively than standard Softmax row-stochastic ones. As previously shown for Softmax, skip connections are crucial to mitigate rank collapse. We empirically validate this phenomenon on both sentiment analysis and image classification tasks. Moreover, we derive a theoretical bound for the pure self-attention rank decay when using Sinkhorn normalization and find that rank decays to one doubly exponentially with depth, a phenomenon that has already been shown for Softmax.
Abstract（参考訳）: 自己維持メカニズムは、Transformerアーキテクチャの成功の中心である。しかし、標準的な行確率的注意は層間の信号劣化に悩まされていることが示されている。特に、高度に集中した注意分布を特徴とするエントロピー崩壊と同様に、ランク崩壊を誘発し、より均一なトークン表現をもたらす。最近の研究は、エントロピー正規化の一形態として、二重確率的注意の利点を強調し、よりバランスの取れた注意分布を促進し、経験的パフォーマンスの向上につながった。本稿では,ネットワーク深度にまたがるランク崩壊について検討し,Sinkhornアルゴリズムで正規化された2つの確率的注意行列が,標準的なSoftmax行確率行列よりも効率的にランクを保っていることを示す。これまでSoftmaxで示されていたように、スキップ接続はランク崩壊を緩和するために不可欠である。我々はこの現象を感情分析と画像分類の両方で実証的に検証した。さらに、シンクホーン正規化(英語版)を用いるとき、純粋自己アテンションランク減衰の理論的境界を導出し、そのランク減衰が深さとともに2倍指数的に指数関数的に現れることを発見した。

論文の概要: Sinkhorn doubly stochastic attention rank decay analysis

関連論文リスト