Related papers: Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects

Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects

URL: http://arxiv.org/abs/2401.12736v1
Date: Tue, 23 Jan 2024 13:13:45 GMT
Title: Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects
Authors: Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li
Abstract summary: Small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes. We propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism. On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models.
Score: 8.933264104073832
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent studies reveal that the remarkable performance of Vision transformers (ViTs) benefits from large receptive fields. For this reason, the large convolutional kernel design becomes an ideal solution to make Convolutional Neural Networks (CNNs) great again. However, the typical large convolutional kernels turn out to be hardware-unfriendly operators, resulting in discount compatibility of various hardware platforms. Thus, it is unwise to simply enlarge the convolutional kernel size. In this paper, we reveal that small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes. Then, we propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism, while remaining hardware-friendly. Experimental results show that our shift-wise operator significantly improves the accuracy of a regular CNN while markedly reducing computational requirements. On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models. Code & models at https://github.com/lidc54/shift-wiseConv.

Related papers

D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation [7.894630378784007]
We propose Dynamic Large Kernel (DLK) and Dynamic Feature Fusion (DFF) modules. D-Net is able to effectively utilize a multi-scale large receptive field and adaptively harness global contextual information.
arXiv Detail & Related papers (2024-03-15T20:49:43Z)
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution [35.1473732030645]
Inspired by human vision, we propose a human-like peripheral convolution that efficiently reduces over 90% parameter count of dense grid convolution. Our peripheral convolution behaves highly similar to human, reducing the complexity of convolution from O(K2) to O(logK) without backfiring performance. For the first time, we successfully scale up the kernel size of CNNs to an unprecedented 101x101 and demonstrate consistent improvements.
arXiv Detail & Related papers (2024-03-12T12:19:05Z)
InceptionNeXt: When Inception Meets ConvNeXt [167.61042926444105]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance. InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z)
ParCNetV2: Oversized Kernel with Enhanced Attention [60.141606180434195]
We introduce a convolutional neural network architecture named ParCNetV2. It extends position-aware circular convolution (ParCNet) with oversized convolutions and strengthens attention through bifurcate gate units. Our method outperforms other pure convolutional neural networks as well as neural networks hybridizing CNNs and transformers.
arXiv Detail & Related papers (2022-11-14T07:22:55Z)
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z)
Salient Object Detection via Dynamic Scale Routing [62.26677215668959]
This paper introduces the "dynamic" scale routing (as a brand-new idea) in this paper. It will result in a generic plug-in that could directly fit the existing feature backbone. We provide a self-adaptive bidirectional decoder design to accommodate the DPConv-based encoder best.
arXiv Detail & Related papers (2022-10-25T08:01:27Z)
Omni-Dimensional Dynamic Convolution [25.78940854339179]
Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs) Recent research in dynamic convolution shows that learning a linear combination of $n$ convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs. We present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design.
arXiv Detail & Related papers (2022-09-16T14:05:38Z)
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions [109.33112814212129]
We show that input-adaptive, long-range and high-order spatial interactions can be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution ($textitgtextitn$Conv) that performs high-order spatial interactions with gated convolutions. Based on the operation, we construct a new family of generic vision backbones named HorNet.
arXiv Detail & Related papers (2022-07-28T17:59:02Z)
More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity [103.62784587778037]
Recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism. We propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51x51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers.
arXiv Detail & Related papers (2022-07-07T23:55:52Z)
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs) Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z)
Hyper-Convolutions via Implicit Kernels for Medical Imaging [18.98078260974008]
We present the textithyper-convolution, a novel building block that implicitly encodes the convolutional kernel using spatial coordinates. We demonstrate in our experiments that replacing regular convolutions with hyper-convolutions can improve performance with less parameters, and increase robustness against noise.
arXiv Detail & Related papers (2022-02-06T03:56:19Z)
Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions [27.141754658998323]
The proposed architecture can support filter kernels with different sizes with high flexibility. For image classification, the accuracy is increased by 1% by simply replacing $3 times 3$ filters with $5 times 5$ filters in depthwise convolutions.
arXiv Detail & Related papers (2021-04-29T05:45:16Z)
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer [76.44375136492827]
Convolutional Neural Networks (CNNs) are often scale-sensitive. We bridge this regret by exploiting multi-scale features in a finer granularity. The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates.
arXiv Detail & Related papers (2020-07-13T05:14:11Z)
XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv) It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels. XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.