X-volution: On the unification of convolution and self-attention
- URL: http://arxiv.org/abs/2106.02253v2
- Date: Mon, 7 Jun 2021 09:03:46 GMT
- Title: X-volution: On the unification of convolution and self-attention
- Authors: Xuanhong Chen and Hang Wang and Bingbing Ni
- Abstract summary: We propose a multi-branch elementary module composed of both convolution and self-attention operation.
The proposed X-volution achieves highly competitive visual understanding improvements.
- Score: 52.80459687846842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolution and self-attention are acting as two fundamental building blocks
in deep neural networks, where the former extracts local image features in a
linear way while the latter non-locally encodes high-order contextual
relationships. Though essentially complementary to each other, i.e.,
first-/high-order, stat-of-the-art architectures, i.e., CNNs or transformers
lack a principled way to simultaneously apply both operations in a single
computational module, due to their heterogeneous computing pattern and
excessive burden of global dot-product for visual tasks. In this work, we
theoretically derive a global self-attention approximation scheme, which
approximates a self-attention via the convolution operation on transformed
features. Based on the approximated scheme, we establish a multi-branch
elementary module composed of both convolution and self-attention operation,
capable of unifying both local and non-local feature interaction. Importantly,
once trained, this multi-branch module could be conditionally converted into a
single standard convolution operation via structural re-parameterization,
rendering a pure convolution styled operator named X-volution, ready to be
plugged into any modern networks as an atomic operation. Extensive experiments
demonstrate that the proposed X-volution, achieves highly competitive visual
understanding improvements (+1.2% top-1 accuracy on ImageNet classification,
+1.7 box AP and +1.5 mask AP on COCO detection and segmentation).
Related papers
- TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic
Token Mixer for Visual Recognition [71.6546914957701]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way.
We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.
In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z) - Slide-Transformer: Hierarchical Vision Transformer with Local
Self-Attention [34.26177289099421]
Self-attention mechanism has been a key factor in the recent progress of Vision Transformer (ViT)
We propose a novel local attention module, which leverages common convolution operations to achieve high efficiency, flexibility and generalizability.
Our module realizes the local attention paradigm in both efficient and flexible manner.
arXiv Detail & Related papers (2023-04-09T13:37:59Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - On the Integration of Self-Attention and Convolution [33.899471118470416]
Convolution and self-attention are powerful techniques for representation learning.
In this paper, we show that there exists a strong underlying relation between them.
We show that the bulk of computations of these two paradigms are in fact done with the same operation.
arXiv Detail & Related papers (2021-11-29T14:37:05Z) - Involution: Inverting the Inherence of Convolution for Visual
Recognition [72.88582255910835]
We present a novel atomic operation for deep neural networks by inverting the principles of convolution, coined as involution.
The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition.
Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely.
arXiv Detail & Related papers (2021-03-10T18:40:46Z) - Self-grouping Convolutional Neural Networks [30.732298624941738]
We propose a novel method of designing self-grouping convolutional neural networks, called SG-CNN.
For each filter, we first evaluate the importance value of their input channels to identify the importance vectors.
Using the resulting emphdata-dependent centroids, we prune the less important connections, which implicitly minimizes the accuracy loss of the pruning.
arXiv Detail & Related papers (2020-09-29T06:24:32Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.