More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using
Sparsity
- URL: http://arxiv.org/abs/2207.03620v1
- Date: Thu, 7 Jul 2022 23:55:52 GMT
- Title: More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using
Sparsity
- Authors: Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian
Wu, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang
- Abstract summary: Recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism.
We propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51x51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers.
- Score: 103.62784587778037
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transformers have quickly shined in the computer vision world since the
emergence of Vision Transformers (ViTs). The dominant role of convolutional
neural networks (CNNs) seems to be challenged by increasingly effective
transformer-based models. Very recently, a couple of advanced convolutional
models strike back with large kernels motivated by the local but large
attention mechanism, showing appealing performance and efficiency. While one of
them, i.e. RepLKNet, impressively manages to scale the kernel size to 31x31
with improved performance, the performance starts to saturate as the kernel
size continues growing, compared to the scaling trend of advanced ViTs such as
Swin Transformer. In this paper, we explore the possibility of training extreme
convolutions larger than 31x31 and test whether the performance gap can be
eliminated by strategically enlarging convolutions. This study ends up with a
recipe for applying extremely large kernels from the perspective of sparsity,
which can smoothly scale up kernels to 61x61 with better performance. Built on
this recipe, we propose Sparse Large Kernel Network (SLaK), a pure CNN
architecture equipped with 51x51 kernels that can perform on par with or better
than state-of-the-art hierarchical Transformers and modern ConvNet
architectures like ConvNeXt and RepLKNet, on ImageNet classification as well as
typical downstream tasks. Our code is available here
https://github.com/VITA-Group/SLaK.
Related papers
- Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations [17.41381592056492]
This paper proposes the paradigm of large convolutional kernels in designing modern Convolutional Neural Networks (ConvNets)
We establish that employing a few large kernels, instead of stacking multiple smaller ones, can be a superior design strategy.
We propose the UniRepLKNet architecture, which offers systematical architecture design principles specifically crafted for large- Kernel ConvNets.
arXiv Detail & Related papers (2024-10-10T15:43:55Z) - KernelWarehouse: Rethinking the Design of Dynamic Convolution [16.101179962553385]
KernelWarehouse redefines the basic concepts of Kernels", assembling kernels" and attention function"
We testify the effectiveness of KernelWarehouse on ImageNet and MS-COCO datasets using various ConvNet architectures.
arXiv Detail & Related papers (2024-06-12T05:16:26Z) - PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution [35.1473732030645]
Inspired by human vision, we propose a human-like peripheral convolution that efficiently reduces over 90% parameter count of dense grid convolution.
Our peripheral convolution behaves highly similar to human, reducing the complexity of convolution from O(K2) to O(logK) without backfiring performance.
For the first time, we successfully scale up the kernel size of CNNs to an unprecedented 101x101 and demonstrate consistent improvements.
arXiv Detail & Related papers (2024-03-12T12:19:05Z) - Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects [8.933264104073832]
Small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes.
We propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism.
On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T13:13:45Z) - Are Large Kernels Better Teachers than Transformers for ConvNets? [82.4742785108714]
This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small- Kernel ConvNets.
arXiv Detail & Related papers (2023-05-30T21:05:23Z) - InceptionNeXt: When Inception Meets ConvNeXt [167.61042926444105]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance.
InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z) - InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z) - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs)
Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm.
We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.