Dual Complementary Dynamic Convolution for Image Recognition
- URL: http://arxiv.org/abs/2211.06163v1
- Date: Fri, 11 Nov 2022 12:32:12 GMT
- Title: Dual Complementary Dynamic Convolution for Image Recognition
- Authors: Longbin Yan, Yunxiao Qin, Shumin Liu, Jie Chen
- Abstract summary: We propose a novel two-branch dual complementary dynamic convolution (DCDC) operator for convolutional neural networks (CNNs)
The DCDC operator overcomes the limitations of vanilla convolution and most existing dynamic convolutions who capture only spatial-adaptive features.
Experiments show that the DCDC operator based ResNets (DCDC-ResNets) significantly outperform vanilla ResNets and most state-of-the-art dynamic convolutional networks on image classification.
- Score: 13.864357201410648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a powerful engine, vanilla convolution has promoted huge breakthroughs in
various computer tasks. However, it often suffers from sample and content
agnostic problems, which limits the representation capacities of the
convolutional neural networks (CNNs). In this paper, we for the first time
model the scene features as a combination of the local spatial-adaptive parts
owned by the individual and the global shift-invariant parts shared to all
individuals, and then propose a novel two-branch dual complementary dynamic
convolution (DCDC) operator to flexibly deal with these two types of features.
The DCDC operator overcomes the limitations of vanilla convolution and most
existing dynamic convolutions who capture only spatial-adaptive features, and
thus markedly boosts the representation capacities of CNNs. Experiments show
that the DCDC operator based ResNets (DCDC-ResNets) significantly outperform
vanilla ResNets and most state-of-the-art dynamic convolutional networks on
image classification, as well as downstream tasks including object detection,
instance and panoptic segmentation tasks, while with lower FLOPs and
parameters.
Related papers
- TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic
Token Mixer for Visual Recognition [71.6546914957701]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way.
We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.
In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z) - SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution [16.56592303409295]
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.
We propose a new framework, textbfSparse Dynamic Convolution (textscSD-Conv), to naturally integrate these two paths.
arXiv Detail & Related papers (2022-04-05T14:03:54Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - OneDConv: Generalized Convolution For Transform-Invariant Representation [76.15687106423859]
We propose a novel generalized one dimension convolutional operator (OneDConv)
It dynamically transforms the convolution kernels based on the input features in a computationally and parametrically efficient manner.
It improves the robustness and generalization of convolution without sacrificing the performance on common images.
arXiv Detail & Related papers (2022-01-15T07:44:44Z) - Optimising for Interpretability: Convolutional Dynamic Alignment
Networks [108.83345790813445]
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA Nets)
Their core building blocks are Dynamic Alignment Units (DAUs), which are optimised to transform their inputs with dynamically computed weight vectors that align with task-relevant patterns.
CoDA Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions.
arXiv Detail & Related papers (2021-09-27T12:39:46Z) - DMSANet: Dual Multi Scale Attention Network [0.0]
We propose a new attention module that not only achieves the best performance but also has lesser parameters compared to most existing models.
Our attention module can easily be integrated with other convolutional neural networks because of its lightweight nature.
arXiv Detail & Related papers (2021-06-13T10:31:31Z) - Dynamic Clone Transformer for Efficient Convolutional Neural Netwoks [0.0]
We introduce a novel concept termed multi-path fully connected pattern (MPFC) to rethink the interdependencies of topology pattern, accuracy and efficiency for ConvNets.
Inspired by MPFC, we propose a dual-branch module named dynamic clone transformer (DCT) where one branch generates multiple replicas from inputs and another branch reforms those clones through a series of difference vectors conditional on inputs itself to produce more variants.
arXiv Detail & Related papers (2021-06-12T13:42:28Z) - X-volution: On the unification of convolution and self-attention [52.80459687846842]
We propose a multi-branch elementary module composed of both convolution and self-attention operation.
The proposed X-volution achieves highly competitive visual understanding improvements.
arXiv Detail & Related papers (2021-06-04T04:32:02Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - Dynamic Group Convolution for Accelerating Convolutional Neural Networks [23.644124360336754]
We propose dynamic group convolution (DGC) that adaptively selects which part of input channels to be connected within each group.
Multiple groups can adaptively capture abundant and complementary visual/semantic features for each input image.
The DGC preserves the original network structure and has similar computational efficiency as the conventional group convolution simultaneously.
arXiv Detail & Related papers (2020-07-08T16:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.