Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects
- URL: http://arxiv.org/abs/2401.12736v1
- Date: Tue, 23 Jan 2024 13:13:45 GMT
- Title: Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects
- Authors: Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li
- Abstract summary: Small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes.
We propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism.
On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models.
- Score: 8.933264104073832
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent studies reveal that the remarkable performance of Vision transformers
(ViTs) benefits from large receptive fields. For this reason, the large
convolutional kernel design becomes an ideal solution to make Convolutional
Neural Networks (CNNs) great again. However, the typical large convolutional
kernels turn out to be hardware-unfriendly operators, resulting in discount
compatibility of various hardware platforms. Thus, it is unwise to simply
enlarge the convolutional kernel size. In this paper, we reveal that small
convolutional kernels and convolution operations can achieve the closing
effects of large kernel sizes. Then, we propose a shift-wise operator that
ensures the CNNs capture long-range dependencies with the help of the sparse
mechanism, while remaining hardware-friendly. Experimental results show that
our shift-wise operator significantly improves the accuracy of a regular CNN
while markedly reducing computational requirements. On the ImageNet-1k, our
shift-wise enhanced CNN model outperforms the state-of-the-art models. Code &
models at https://github.com/lidc54/shift-wiseConv.
Related papers
- PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution [35.1473732030645]
Inspired by human vision, we propose a human-like peripheral convolution that efficiently reduces over 90% parameter count of dense grid convolution.
Our peripheral convolution behaves highly similar to human, reducing the complexity of convolution from O(K2) to O(logK) without backfiring performance.
For the first time, we successfully scale up the kernel size of CNNs to an unprecedented 101x101 and demonstrate consistent improvements.
arXiv Detail & Related papers (2024-03-12T12:19:05Z) - InceptionNeXt: When Inception Meets ConvNeXt [167.61042926444105]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance.
InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z) - ParCNetV2: Oversized Kernel with Enhanced Attention [60.141606180434195]
We introduce a convolutional neural network architecture named ParCNetV2.
It extends position-aware circular convolution (ParCNet) with oversized convolutions and strengthens attention through bifurcate gate units.
Our method outperforms other pure convolutional neural networks as well as neural networks hybridizing CNNs and transformers.
arXiv Detail & Related papers (2022-11-14T07:22:55Z) - InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z) - More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using
Sparsity [103.62784587778037]
Recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism.
We propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51x51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers.
arXiv Detail & Related papers (2022-07-07T23:55:52Z) - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs)
Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm.
We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z) - Hyper-Convolutions via Implicit Kernels for Medical Imaging [18.98078260974008]
We present the textithyper-convolution, a novel building block that implicitly encodes the convolutional kernel using spatial coordinates.
We demonstrate in our experiments that replacing regular convolutions with hyper-convolutions can improve performance with less parameters, and increase robustness against noise.
arXiv Detail & Related papers (2022-02-06T03:56:19Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.