Fugu-MT 論文翻訳(概要): Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

論文の概要: Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

arxiv url: http://arxiv.org/abs/2510.10060v1
Date: Sat, 11 Oct 2025 06:54:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.758425
Title: Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
Title（参考訳）: トランスリューション:適応的・相対的モデリングのための自己意識と畳み込みの統合
Authors: Hehe Fan, Yi Yang, Mohan Kankanhalli, Fei Wu,
Abstract要約: 本稿では、自己認識の適応的識別能力と、畳み込みの相対的符号化能力を一体化する操作であるTranslutionを紹介する。コンピュータビジョンと自然言語処理タスクの実験から、Translutionは自己意図よりも精度が高いことが示されている。
参考スコア（独自算出の注目度）: 34.84084078479298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (e.g., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in self-attention, and 2) encoding these tokens effectively. Self-attention can adaptively identify these elements but relies on absolute positional embedding for structural representation learning. In contrast, convolution encodes elements in a relative manner, yet their fixed kernel size limits their ability to adaptively select the relevant elements. In this paper, we introduce Translution, an operation that unifies the adaptive identification capability of self-attention and the relative encoding advantage of convolution. However, this integration leads to a substantial increase in the number of parameters, exceeding most currently available computational resources. Therefore, we propose a lightweight variant of Translution, named {\alpha}-Translution. Experiments on computer vision and natural language processing tasks show that Translution (including {\alpha}-Translution) achieves superior accuracy compared to self-attention. The code is available at https://github.com/hehefan/Translution.
Abstract（参考訳）: 特定のタイプのデータをモデリングする際には、2つの重要な側面があると考えています。 1) 中心的要素、畳み込み受容領域、またはクエリ要素、自己注意領域などに関連する要素(例えば、画像画素又はテキスト語)を識別すること。 2)これらのトークンを効果的に符号化する。自己注意はこれらの要素を適応的に識別することができるが、構造的表現学習には絶対的な位置埋め込みに依存する。対照的に、畳み込みは相対的に要素を符号化するが、固定されたカーネルサイズは関連する要素を適応的に選択する能力を制限する。本稿では,自己認識の適応的識別能力と,畳み込みの相対的符号化機能を組み合わせたTranslutionを紹介する。しかし、この統合によりパラメータの数が大幅に増加し、現在利用可能なほとんどの計算資源を超えている。そこで本稿では,Translution の軽量版である {\alpha}-Translutionを提案する。コンピュータビジョンと自然言語処理タスクの実験により、トランスリューション(英語版)(alpha}-Translutionを含む)が自己意図よりも優れた精度を達成することが示された。コードはhttps://github.com/hehefan/Translution.comで公開されている。

論文の概要: Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

関連論文リスト