Fugu-MT 論文翻訳(概要): More Expressive Attention with Negative Weights

論文の概要: More Expressive Attention with Negative Weights

arxiv url: http://arxiv.org/abs/2411.07176v3
Date: Thu, 30 Jan 2025 18:17:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-01-31 16:35:21.115929
Title: More Expressive Attention with Negative Weights
Title（参考訳）: 負の重みを持つより表現力のある注意
Authors: Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan,
Abstract要約: 本稿では,注意重みを否定的に表現力を高めるための新しい注意機構,Cog Attentionを提案する。我々のアプローチは、従来のソフトマックスの注意力の制約を再考し、壊すための有望な研究方向を示唆している。
参考スコア（独自算出の注目度）: 36.40344438470477
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We propose a novel attention mechanism, named Cog Attention, that enables attention weights to be negative for enhanced expressiveness, which stems from two key factors: (1) Cog Attention enhances parameter flexibility. For example, unlike traditional softmax attention heads that use a static output-value (OV) matrix to delete or copy inputs that the heads attend to, Cog Attention naturally learns to use the sign of dynamic query-key (QK) inner products to represent these operations. This enables Cog Attention to perform multiple operations simultaneously within a single head. Meanwhile, Cog Attention's OV matrix can focus more on refinement or modification. (2) Cog Attention enhances the model's robustness against representational collapse by preventing the ``over-squashing'' of earlier tokens into later positions. We develop Transformer-like models which use Cog Attention as attention modules, including decoder-only models at various scales for language modeling and U-ViT diffusion models for image generation. Experiments show that models using Cog Attention exhibit superior performance compared to those employing traditional softmax attention modules. Our approach suggests a promising research direction for rethinking and breaking the entrenched constraints of traditional softmax attention, such as the requirement for non-negative weights.
Abstract（参考訳）: 本研究では,(1)コグ注意がパラメータの柔軟性を高めることから,注意重みを否定的に表現力を高めるための新しい注意機構を提案する。例えば、静的出力値(OV)行列を使用してヘッドが参加する入力を削除またはコピーする従来のソフトマックスアテンションヘッドとは異なり、Cog Attentionは動的クエリキー(QK)内部製品のサインを使ってこれらの操作を表現することを自然に学習する。これにより、Cog Attentionは1つのヘッド内で同時に複数の操作を実行できる。一方、Cog AttentionのOVマトリクスは改良や改良に重点を置いている。 2) Cog Attention は、初期のトークンの 'over-squashing'' を後続の位置に置くのを防ぐことによって、モデルの表現的崩壊に対する堅牢性を高める。我々は、言語モデリングのためのデコーダのみのモデルや画像生成のためのU-ViT拡散モデルを含む、Cag Attentionをアテンションモジュールとして使用するTransformerライクなモデルを開発した。実験により,Cog Attentionを用いたモデルは,従来のソフトマックスアテンションモジュールを用いたモデルに比べて優れた性能を示した。提案手法は,非負重みの要件など,従来のソフトマックス注意の制約を再考し,破る上で有望な研究方向を示すものである。

関連論文リスト

Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models [7.80071686970278]
従来のSoftmaxの注意は、推論トークンの長さが増加するにつれて、数値的な不安定さと性能の低下に悩まされる。本稿では,Softmax演算を非線形変換と$l_1$-normに分解することで,これらの問題に対処する。我々は,従来のSoftmaxのアテンションよりも優れた性能を持つ新しいアテンション機構を,様々な推論長さにわたって構築する。
論文参考訳（メタデータ） (2025-01-23T07:21:08Z)
Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models [64.67721492968941]
ゼロショットロバストネス(TGA-ZSR)のためのテキストガイド型アテンションを提案する。我々のゴールは、CLIPモデルの一般化を維持し、敵の堅牢性を高めることである。本手法は,現在の最先端技術よりも9.58%の精度でゼロショット精度を向上する。
論文参考訳（メタデータ） (2024-10-29T07:15:09Z)
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
遅延ベース画像生成モデルは、画像生成タスクにおいて顕著な成功を収めた。同じ遅延空間を共有するにもかかわらず、自己回帰モデルは画像生成において LDM や MIM よりもかなり遅れている。本稿では,画像生成モデルのための遅延空間を安定化する,単純だが効果的な離散画像トークン化手法を提案する。
論文参考訳（メタデータ） (2024-10-16T12:13:17Z)
Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering [1.8786950286587742]
モデルのサイズが大きくなるにつれて、マルチヘッドアテンションのパッチにハイノームアーティファクトが異常に現れる。推論中に注意関数を操作するITAE(Inference-Time Attention Engineering)を提案する。 ITAEは、複数のデータセットのクラスタリング精度を改善し、潜在空間でより表現力のある機能を示す。
論文参考訳（メタデータ） (2024-10-07T07:26:10Z)
A Primal-Dual Framework for Transformers and Neural Networks [52.814467832108875]
自己注意は、シーケンスモデリングタスクにおけるトランスフォーマーの顕著な成功の鍵である。自己アテンションは、支持ベクトル回帰問題から導かれる支持ベクトル展開に対応することを示す。 Batch Normalized Attention (Attention-BN) と Scaled Head (Attention-SH) の2つの新しい注意点を提案する。
論文参考訳（メタデータ） (2024-06-19T19:11:22Z)
Simple linear attention language models balance the recall-throughput tradeoff [60.06020449520365]
線形およびすべり窓の注意を結合したシンプルなアーキテクチャであるBASEDを提案する。我々は、最大1.3bパラメータの言語モデルをトレーニングし、BASEDがパープレキシティにおいて最強のサブクワッドラティックモデルと一致し、実世界のリコール集約タスクにおいて6.22の精度ポイントでそれらのモデルを上回っていることを示す。
論文参考訳（メタデータ） (2024-02-28T19:28:27Z)
FAST: Factorizable Attention for Speeding up Transformers [1.3637227185793512]
本稿では,スペーシフィケーションを伴わずに,注目行列の完全な表現を維持する線形スケールアテンション機構を提案する。その結果、我々の注意機構は堅牢な性能を示し、自己注意が使用される多様なアプリケーションに対して大きな可能性を秘めていることが示唆された。
論文参考訳（メタデータ） (2024-02-12T18:59:39Z)
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective [4.499369811647602]
注意機構は、他のモデル要素と慎重に組み合わせた場合、説明をモデル化するためのショートカットとして機能する可能性があることを示す。本研究により,注意機構は,他のモデル要素と慎重に組み合わせた場合,モデル説明のためのショートカットとして機能する可能性が示唆された。
論文参考訳（メタデータ） (2022-10-31T12:53:20Z)
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost [53.746169882193456]
最近の研究は、自己注意の二次的コストを克服するために、様々なスパークアテンションモジュールを提案している。本稿では,それぞれの注意を混合メンバーシップブロックモデルで表現することで,両方の問題を解決するモデルを提案する。我々のモデルは、以前の効率的な変種とオリジナルのトランスフォーマーより優れており、十分に注目されています。
論文参考訳（メタデータ） (2022-10-27T15:30:52Z)
Causal Attention for Vision-Language Tasks [142.82608295995652]
新しい注意機構:Causal Attention (CATT)について紹介する。 CATTは、既存の注目に基づく視覚言語モデルにおける絶え間ない欠点を除去する。特に,CATTは大規模プレトレーニングにおいて大きな可能性を秘めている。
論文参考訳（メタデータ） (2021-03-05T06:38:25Z)
SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
トランスフォーマーベースのモデルは、その強力な能力のために自然言語処理(NLP)タスクに人気がある。事前学習モデルの注意マップの可視化は,自己着脱機構を理解するための直接的な方法の一つである。本研究では,sparsebert設計の指導にも適用可能な微分可能アテンションマスク(dam)アルゴリズムを提案する。
論文参考訳（メタデータ） (2021-02-25T14:13:44Z)
Gaussian Constrained Attention Network for Scene Text Recognition [16.485898019983797]
既存の注意機構は注意拡散の問題に直面しており、モデルが特定の特徴領域に焦点を絞らない可能性がある。本稿では,新しいガウス制約リファインメントモジュールを組み込んだ2次元アテンションベース手法を提案する。このように、注意重みはより集中し、注意に基づく認識ネットワークはより良いパフォーマンスを達成する。
論文参考訳（メタデータ） (2020-10-19T01:55:30Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。