Fugu-MT 論文翻訳(概要): Convolution-enhanced Evolving Attention Networks

論文の概要: Convolution-enhanced Evolving Attention Networks

arxiv url: http://arxiv.org/abs/2212.08330v1
Date: Fri, 16 Dec 2022 08:14:04 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-19 14:11:27.231081
Title: Convolution-enhanced Evolving Attention Networks
Title（参考訳）: コンボリューション強化型進化型注意ネットワーク
Authors: Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong
Abstract要約: 本稿では,残余畳み込みモジュールの連鎖を通じて,相互関係の進化を直接モデル化する,新規で汎用的なアテンション機構を提案する。我々の実装はEA-DC-(Evolving Attention-enhanced Dilated Convolutional)変換であり、最先端のモデルよりも大幅に優れています。
参考スコア（独自算出の注目度）: 41.684265133316096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention
Abstract（参考訳）: Transformersのような注意に基づくニューラルネットワークは、コンピュータビジョン、自然言語処理、時系列解析など、多くのアプリケーションで普及している。あらゆる種類の注意ネットワークにおいて、アテンションマップは入力トークン間のセマンティックな依存関係を符号化する上で重要である。しかし、既存のアテンションネットワークの多くは表現に基づくモデリングや推論を行い、各レイヤのアテンションマップは明示的な相互作用なしに別々に学習される。本稿では,残余畳み込みモジュールの連鎖を通じて,相互関係の進化を直接モデル化する,新規で汎用的な注意機構を提案する。主な動機は2つある。一方で、異なる層内のアテンションマップは、転送可能な知識を共有しているため、残りの接続を追加することで、層間の相互関係の情報フローが容易になる。一方,様々な抽象レベルで注目度マップが進化する傾向が自然にみられるため,専用畳み込み型モジュールを活用してこのプロセスをキャプチャすることは有益である。提案手法を組み込んだ畳み込み型アテンションネットワークは,時系列表現,自然言語理解,機械翻訳,画像分類など,様々なアプリケーションにおいて優れた性能を実現する。特に時系列表現タスクでは、EA-DC-(Evolving Attention-enhanced Dilated Convolutional)変換器は最先端モデルよりも優れており、最高のSOTAに比べて平均17%改善されている。私たちの知る限りでは、注意マップのレイヤーワイド進化を明示的にモデル化する最初の作品です。私たちの実装はhttps://github.com/pkuyym/EvolvingAttentionで利用可能です。

論文の概要: Convolution-enhanced Evolving Attention Networks

関連論文リスト