Fugu-MT 論文翻訳(概要): Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

論文の概要: Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

arxiv url: http://arxiv.org/abs/2305.19798v2
Date: Tue, 5 Dec 2023 09:26:05 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-06 20:09:06.910818
Title: Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Title（参考訳）: プライマル・アテンション:非対称カーネルsvdによる自己アテンション
Authors: Yingyi Chen, Qinghua Tao, Francesco Tonin, Johan A.K. Suykens
Abstract要約: 非対称カーネル特異値分解(KSVD)による自己注意の表現と最適化のための新しい視点を提供する。 KSVDの最適化は、正規化損失を最小限に抑え、余分な分解を伴わずに低ランク特性を促進できることを示す。これは、自己アテンションにおける非対称核の原始双対表現を提供し、モデリングと最適化にうまく適用した最初の作品である。
参考スコア（独自算出の注目度）: 21.87428356353377
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recently, a new line of works has emerged to understand and improve self-attention in Transformers by treating it as a kernel machine. However, existing works apply the methods for symmetric kernels to the asymmetric self-attention, resulting in a nontrivial gap between the analytical understanding and numerical implementation. In this paper, we provide a new perspective to represent and optimize self-attention through asymmetric Kernel Singular Value Decomposition (KSVD), which is also motivated by the low-rank property of self-attention normally observed in deep layers. Through asymmetric KSVD, $i$) a primal-dual representation of self-attention is formulated, where the optimization objective is cast to maximize the projection variances in the attention outputs; $ii$) a novel attention mechanism, i.e., Primal-Attention, is proposed via the primal representation of KSVD, avoiding explicit computation of the kernel matrix in the dual; $iii$) with KKT conditions, we prove that the stationary solution to the KSVD optimization in Primal-Attention yields a zero-value objective. In this manner, KSVD optimization can be implemented by simply minimizing a regularization loss, so that low-rank property is promoted without extra decomposition. Numerical experiments show state-of-the-art performance of our Primal-Attention with improved efficiency. Moreover, we demonstrate that the deployed KSVD optimization regularizes Primal-Attention with a sharper singular value decay than that of the canonical self-attention, further verifying the great potential of our method. To the best of our knowledge, this is the first work that provides a primal-dual representation for the asymmetric kernel in self-attention and successfully applies it to modeling and optimization.
Abstract（参考訳）: 近年、カーネルマシンとして扱うことで変圧器の自己着脱を理解・改善するための新しい作品が登場している。しかし、既存の研究は対称カーネルの手法を非対称自己アテンションに適用し、解析的理解と数値的実装の間に非自明なギャップをもたらす。本稿では,非対称なカーネル特異値分解(KSVD)による自己注意の表現と最適化を行う新しい視点を提供する。 Through asymmetric KSVD, $i$) a primal-dual representation of self-attention is formulated, where the optimization objective is cast to maximize the projection variances in the attention outputs; $ii$) a novel attention mechanism, i.e., Primal-Attention, is proposed via the primal representation of KSVD, avoiding explicit computation of the kernel matrix in the dual; $iii$) with KKT conditions, we prove that the stationary solution to the KSVD optimization in Primal-Attention yields a zero-value objective. このようにksvd最適化は、単に正規化損失を最小化することで実装できるため、低ランク特性は、余分な分解なしに促進される。数値実験により, プライマル・アテンションの精度が向上し, 術中性能が向上した。さらに, ksvd最適化は, 標準的自己アテンションよりも鋭利な特異値減衰を伴う主観的アテンションを正則化し, 提案手法の可能性をさらに検証することを示す。我々の知る限りでは、これは非対称なカーネルに対して自己注意で原始双対表現を提供し、モデリングと最適化にうまく適用する最初の作品である。

関連論文リスト

ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans [13.695885742446027]
自己注意は、トレーニング中にいくつかのトークンを過度に集中させ、その結果、準最適情報フローをもたらす可能性がある。我々は,スライスされた最適輸送に基づく,新しい並列化可能な二重確率的アテンション機構を提案する。本手法はシンクホーンの正規化を繰り返すことなく二重性を強制し,効率を著しく向上させる。
論文参考訳（メタデータ） (2025-02-11T21:20:48Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
そこで本研究では,事前訓練した重みを効率よく微調整する直交微調整法を導入し,頑健さと一般化の強化を実現した。自己正規化戦略は、OrthSRと呼ばれるVLMのゼロショット一般化の観点から安定性を維持するためにさらに活用される。筆者らはCLIPとCoOpを再検討し,少数の画像のクラスフィシエーションシナリオにおけるモデルの改善を効果的に行う。
論文参考訳（メタデータ） (2024-07-11T10:35:53Z)
Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method [21.16129116282759]
共分散固有確率(CCE)に基づく新しい非対称学習パラダイムを導入する。有限サンプル近似を用いて非対称Nystr"om法を定式化し,トレーニングを高速化する。
論文参考訳（メタデータ） (2024-06-13T02:12:18Z)
Implicit Bias and Fast Convergence Rates for Self-attention [30.08303212679308]
トランスフォーマーのコアメカニズムであるセルフアテンションは、従来のニューラルネットワークと区別し、その優れたパフォーマンスを駆動する。固定線形復号器をバイナリに固定した自己アテンション層をトレーニングする際の勾配降下(GD)の暗黙バイアスについて検討した。 W_t$ から $W_mm$ に対する最初の有限時間収束率と、注意写像のスペーサー化率を提供する。
論文参考訳（メタデータ） (2024-02-08T15:15:09Z)
Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes [20.023544206079304]
我々は不確実性を考慮した自己注意構築のためのKEP-SVGP(Kernel-Eigen Pair Sparse Variational Gaussian Process)を提案する。 In-distriion, distribution-shift, out-of-distriionベンチマークにおける優れた性能と効率を検証した。
論文参考訳（メタデータ） (2024-02-02T15:05:13Z)
Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing [28.295987262533075]
勾配降下(GD)は機械学習モデルの一般化に不可欠である。我々はGDが暗黙の正規化を誘導し、コンパクト表現を促進することを示す。以上の結果から, マトリックスセンシングのテンソルメトリーパラリゼーションの重要性が示唆された。
論文参考訳（メタデータ） (2023-10-24T06:40:26Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
本稿では,ニューラルネットワークトレーニングを安定化(大規模)するための原理的手法として,線形アヘッドの理論解析を提案する。最適化過程の不安定性は、しばしば損失ランドスケープの非単調性によって引き起こされるものであり、非拡張作用素の理論を活用することによって線型性がいかに役立つかを示す。
論文参考訳（メタデータ） (2023-10-20T12:45:12Z)
Transformers as Support Vector Machines [54.642793677472724]
自己アテンションの最適化幾何と厳密なSVM問題との間には,形式的等価性を確立する。勾配降下に最適化された1層変圧器の暗黙バイアスを特徴付ける。これらの発見は、最適なトークンを分離し選択するSVMの階層としてのトランスフォーマーの解釈を刺激していると信じている。
論文参考訳（メタデータ） (2023-08-31T17:57:50Z)
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers [52.468311268601056]
本稿では凸双対性のレンズを通して注意を解析する。我々は、大域的最適性に対して解釈可能で解ける等価な有限次元凸問題を導出する。自己認識ネットワークがトークンを暗黙的にクラスタリングする方法を示す。
論文参考訳（メタデータ） (2022-05-17T04:01:15Z)
On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization [26.858608065417663]
スペクトル上の凸最適化は、機械学習、信号処理、統計学に重要な応用がある。低ランク行列による最適化に適したMEGの効率的な実装を提案し、各イテレーションで単一の低ランクSVDのみを使用する。また,本手法の正しい収束のための効率よく計算可能な証明書も提供する。
論文参考訳（メタデータ） (2020-12-18T19:14:51Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
我々は高次元単一インデックスモデルのための正規化自由アルゴリズムを設計する。暗黙正則化現象の理論的保証を提供する。
論文参考訳（メタデータ） (2020-07-16T13:27:47Z)
Controllable Orthogonalization in Training DNNs [96.1365404059924]
直交性はディープニューラルネットワーク(DNN)のトレーニングに広く用いられている。本稿では,ニュートン反復(ONI)を用いた計算効率が高く,数値的に安定な直交化法を提案する。本稿では,画像分類ネットワークの性能向上のために,最適化の利点と表現能力の低下との間に最適なトレードオフを与えるために,直交性を効果的に制御する手法を提案する。また、ONIは、スペクトル正規化と同様に、ネットワークのリプシッツ連続性を維持することにより、GAN(Generative Adversarial Network)のトレーニングを安定化させることを示した。
論文参考訳（メタデータ） (2020-04-02T10:14:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。