Related papers: Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

URL: http://arxiv.org/abs/2203.06717v2
Date: Thu, 17 Mar 2022 15:26:54 GMT
Title: Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Authors: Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun
Abstract summary: We revisit large kernel design in modern convolutional neural networks (CNNs) Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
Score: 148.0476219278875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We suggested five guidelines, e.g., applying re-parameterized large depth-wise convolutions, to design efficient high-performance large-kernel CNNs. Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3. RepLKNet greatly closes the performance gap between CNNs and ViTs, e.g., achieving comparable or superior results than Swin Transformer on ImageNet and a few typical downstream tasks, with lower latency. RepLKNet also shows nice scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0% mIoU on ADE20K, which is very competitive among the state-of-the-arts with similar model sizes. Our study further reveals that, in contrast to small-kernel CNNs, large-kernel CNNs have much larger effective receptive fields, and higher shape bias rather than texture bias. Code & models at https://github.com/megvii-research/RepLKNet.

Related papers

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations [17.41381592056492]
This paper proposes the paradigm of large convolutional kernels in designing modern Convolutional Neural Networks (ConvNets) We establish that employing a few large kernels, instead of stacking multiple smaller ones, can be a superior design strategy. We propose the UniRepLKNet architecture, which offers systematical architecture design principles specifically crafted for large- Kernel ConvNets.
arXiv Detail & Related papers (2024-10-10T15:43:55Z)
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z)
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution [35.1473732030645]
Inspired by human vision, we propose a human-like peripheral convolution that efficiently reduces over 90% parameter count of dense grid convolution. Our peripheral convolution behaves highly similar to human, reducing the complexity of convolution from O(K2) to O(logK) without backfiring performance. For the first time, we successfully scale up the kernel size of CNNs to an unprecedented 101x101 and demonstrate consistent improvements.
arXiv Detail & Related papers (2024-03-12T12:19:05Z)
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective [2.7195102129095003]
Large kernel CNNs have been reported to perform well in downstream vision tasks as well as in classification performance. We revisit the performance of large kernel CNNs in downstream task, focusing on the weakly supervised object localization task. Our study compares the modern large kernel CNNs ConvNeXt, RepLKNet, and SLaK to test the validity of the naive expectation that ERF size is important for improving downstream task performance.
arXiv Detail & Related papers (2024-03-11T12:48:22Z)
Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects [8.933264104073832]
Small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes. We propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism. On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T13:13:45Z)
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z)
More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity [103.62784587778037]
Recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism. We propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51x51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers.
arXiv Detail & Related papers (2022-07-07T23:55:52Z)
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs [78.25819070166351]
We propose the spatial-wise partition convolution and its large- Kernel module. Our large- kernel 3D CNN network, Large Kernel3D, yields notable improvement in 3D tasks. For the first time, we show that large kernels are feasible and essential for 3D visual tasks.
arXiv Detail & Related papers (2022-06-21T17:35:57Z)
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [65.28292822614418]
Giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks. This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
arXiv Detail & Related papers (2020-10-28T08:49:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.