$ShiftwiseConv:$ Small Convolutional Kernel with Large Kernel Effect
- URL: http://arxiv.org/abs/2401.12736v2
- Date: Thu, 13 Mar 2025 09:35:17 GMT
- Title: $ShiftwiseConv:$ Small Convolutional Kernel with Large Kernel Effect
- Authors: Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li,
- Abstract summary: Large kernels make standard convolutional neural networks (CNNs) great again over transformer architectures in various vision tasks.<n>Recent studies meticulously designed around increasing kernel size have shown diminishing returns or stagnation in performance.<n>In this paper, we reveal that the key hidden factors of large kernels can be summarized as two separate components: extracting features at a certain granularity and fusing features by multiple pathways.
- Score: 8.177438505492548
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large kernels make standard convolutional neural networks (CNNs) great again over transformer architectures in various vision tasks. Nonetheless, recent studies meticulously designed around increasing kernel size have shown diminishing returns or stagnation in performance. Thus, the hidden factors of large kernel convolution that affect model performance remain unexplored. In this paper, we reveal that the key hidden factors of large kernels can be summarized as two separate components: extracting features at a certain granularity and fusing features by multiple pathways. To this end, we leverage the multi-path long-distance sparse dependency relationship to enhance feature utilization via the proposed Shiftwise (SW) convolution operator with a pure CNN architecture. In a wide range of vision tasks such as classification, segmentation, and detection, SW surpasses state-of-the-art transformers and CNN architectures, including SLaK and UniRepLKNet. More importantly, our experiments demonstrate that $3 \times 3$ convolutions can replace large convolutions in existing large kernel CNNs to achieve comparable effects, which may inspire follow-up works. Code and all the models at https://github.com/lidc54/shift-wiseConv.
Related papers
- D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation [7.894630378784007]
We propose Dynamic Large Kernel (DLK) and Dynamic Feature Fusion (DFF) modules.
D-Net is able to effectively utilize a multi-scale large receptive field and adaptively harness global contextual information.
arXiv Detail & Related papers (2024-03-15T20:49:43Z) - PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution [35.1473732030645]
Inspired by human vision, we propose a human-like peripheral convolution that efficiently reduces over 90% parameter count of dense grid convolution.
Our peripheral convolution behaves highly similar to human, reducing the complexity of convolution from O(K2) to O(logK) without backfiring performance.
For the first time, we successfully scale up the kernel size of CNNs to an unprecedented 101x101 and demonstrate consistent improvements.
arXiv Detail & Related papers (2024-03-12T12:19:05Z) - InceptionNeXt: When Inception Meets ConvNeXt [167.61042926444105]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance.
InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z) - ParCNetV2: Oversized Kernel with Enhanced Attention [60.141606180434195]
We introduce a convolutional neural network architecture named ParCNetV2.
It extends position-aware circular convolution (ParCNet) with oversized convolutions and strengthens attention through bifurcate gate units.
Our method outperforms other pure convolutional neural networks as well as neural networks hybridizing CNNs and transformers.
arXiv Detail & Related papers (2022-11-14T07:22:55Z) - InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z) - Salient Object Detection via Dynamic Scale Routing [62.26677215668959]
This paper introduces the "dynamic" scale routing (as a brand-new idea) in this paper.
It will result in a generic plug-in that could directly fit the existing feature backbone.
We provide a self-adaptive bidirectional decoder design to accommodate the DPConv-based encoder best.
arXiv Detail & Related papers (2022-10-25T08:01:27Z) - Omni-Dimensional Dynamic Convolution [25.78940854339179]
Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs)
Recent research in dynamic convolution shows that learning a linear combination of $n$ convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs.
We present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design.
arXiv Detail & Related papers (2022-09-16T14:05:38Z) - More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using
Sparsity [103.62784587778037]
Recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism.
We propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51x51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers.
arXiv Detail & Related papers (2022-07-07T23:55:52Z) - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs)
Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm.
We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z) - Hyper-Convolutions via Implicit Kernels for Medical Imaging [18.98078260974008]
We present the textithyper-convolution, a novel building block that implicitly encodes the convolutional kernel using spatial coordinates.
We demonstrate in our experiments that replacing regular convolutions with hyper-convolutions can improve performance with less parameters, and increase robustness against noise.
arXiv Detail & Related papers (2022-02-06T03:56:19Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale
Convolutional Layer [76.44375136492827]
Convolutional Neural Networks (CNNs) are often scale-sensitive.
We bridge this regret by exploiting multi-scale features in a finer granularity.
The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates.
arXiv Detail & Related papers (2020-07-13T05:14:11Z) - XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv)
It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels.
XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.