On the Integration of Self-Attention and Convolution
- URL: http://arxiv.org/abs/2111.14556v1
- Date: Mon, 29 Nov 2021 14:37:05 GMT
- Title: On the Integration of Self-Attention and Convolution
- Authors: Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang,
Gao Huang
- Abstract summary: Convolution and self-attention are powerful techniques for representation learning.
In this paper, we show that there exists a strong underlying relation between them.
We show that the bulk of computations of these two paradigms are in fact done with the same operation.
- Score: 33.899471118470416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolution and self-attention are two powerful techniques for representation
learning, and they are usually considered as two peer approaches that are
distinct from each other. In this paper, we show that there exists a strong
underlying relation between them, in the sense that the bulk of computations of
these two paradigms are in fact done with the same operation. Specifically, we
first show that a traditional convolution with kernel size k x k can be
decomposed into k^2 individual 1x1 convolutions, followed by shift and
summation operations. Then, we interpret the projections of queries, keys, and
values in self-attention module as multiple 1x1 convolutions, followed by the
computation of attention weights and aggregation of the values. Therefore, the
first stage of both two modules comprises the similar operation. More
importantly, the first stage contributes a dominant computation complexity
(square of the channel size) comparing to the second stage. This observation
naturally leads to an elegant integration of these two seemingly distinct
paradigms, i.e., a mixed model that enjoys the benefit of both self-Attention
and Convolution (ACmix), while having minimum computational overhead compared
to the pure convolution or self-attention counterpart. Extensive experiments
show that our model achieves consistently improved results over competitive
baselines on image recognition and downstream tasks. Code and pre-trained
models will be released at https://github.com/Panxuran/ACmix and
https://gitee.com/mindspore/models.
Related papers
- Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - Impact of PolSAR pre-processing and balancing methods on complex-valued
neural networks segmentation tasks [9.6556424340252]
We investigate the semantic segmentation of Polarimetric Synthetic Aperture Radar (PolSAR) using Complex-Valued Neural Network (CVNN)
We exhaustively compare both methods for six model architectures, three complex-valued, and their respective real-equivalent models.
We propose two methods for reducing this gap and performing the results for all input representations, models, and dataset pre-processing.
arXiv Detail & Related papers (2022-10-28T12:49:43Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - Doubly Deformable Aggregation of Covariance Matrices for Few-shot
Segmentation [25.387090319723715]
Training semantic segmentation models with few annotated samples has great potential in various real-world applications.
For the few-shot segmentation task, the main challenge is how to accurately measure the semantic correspondence between the support and query samples.
We propose to aggregate the learnable covariance matrices with a deformable 4D Transformer to effectively predict the segmentation map.
arXiv Detail & Related papers (2022-07-30T20:41:38Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - MixSiam: A Mixture-based Approach to Self-supervised Representation
Learning [33.52892899982186]
Recently contrastive learning has shown significant progress in learning visual representations from unlabeled data.
We propose MixSiam, a mixture-based approach upon the traditional siamese network.
arXiv Detail & Related papers (2021-11-04T08:12:47Z) - X-volution: On the unification of convolution and self-attention [52.80459687846842]
We propose a multi-branch elementary module composed of both convolution and self-attention operation.
The proposed X-volution achieves highly competitive visual understanding improvements.
arXiv Detail & Related papers (2021-06-04T04:32:02Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - GATCluster: Self-Supervised Gaussian-Attention Network for Image
Clustering [9.722607434532883]
We propose a self-supervised clustering network for image Clustering (GATCluster)
Rather than extracting intermediate features first and then performing the traditional clustering, GATCluster semantic cluster labels without further post-processing.
We develop a two-step learning algorithm that is memory-efficient for clustering large-size images.
arXiv Detail & Related papers (2020-02-27T00:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.