InceptionNeXt: When Inception Meets ConvNeXt
- URL: http://arxiv.org/abs/2303.16900v1
- Date: Wed, 29 Mar 2023 17:59:58 GMT
- Title: InceptionNeXt: When Inception Meets ConvNeXt
- Authors: Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang
- Abstract summary: We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance.
InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
- Score: 167.61042926444105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the long-range modeling ability of ViTs, large-kernel
convolutions are widely studied and adopted recently to enlarge the receptive
field and improve model performance, like the remarkable work ConvNeXt which
employs 7x7 depthwise convolution. Although such depthwise operator only
consumes a few FLOPs, it largely harms the model efficiency on powerful
computing devices due to the high memory access costs. For example, ConvNeXt-T
has similar FLOPs with ResNet-50 but only achieves 60% throughputs when trained
on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt
can improve speed, it results in significant performance degradation. It is
still unclear how to speed up large-kernel-based CNN models while preserving
their performance. To tackle this issue, inspired by Inceptions, we propose to
decompose large-kernel depthwise convolution into four parallel branches along
channel dimension, i.e. small square kernel, two orthogonal band kernels, and
an identity mapping. With this new Inception depthwise convolution, we build a
series of networks, namely IncepitonNeXt, which not only enjoy high throughputs
but also maintain competitive performance. For instance, InceptionNeXt-T
achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains
0.2% top-1 accuracy improvement on ImageNet-1K. We anticipate InceptionNeXt can
serve as an economical baseline for future architecture design to reduce carbon
footprint. Code is available at https://github.com/sail-sg/inceptionnext.
Related papers
- Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations [17.41381592056492]
This paper proposes the paradigm of large convolutional kernels in designing modern Convolutional Neural Networks (ConvNets)
We establish that employing a few large kernels, instead of stacking multiple smaller ones, can be a superior design strategy.
We propose the UniRepLKNet architecture, which offers systematical architecture design principles specifically crafted for large- Kernel ConvNets.
arXiv Detail & Related papers (2024-10-10T15:43:55Z) - DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation [8.240211805240023]
We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs)
We propose DSNet, a Dual-Branch CNN architecture, which incorporates atrous convolutions in the shallow layers of the model architecture.
Our models achieve a new state-of-the-art trade-off between accuracy and speed on ADE20K, Cityscapes and BDD datasets.
arXiv Detail & Related papers (2024-06-06T02:51:57Z) - Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects [8.933264104073832]
Small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes.
We propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism.
On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T13:13:45Z) - More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using
Sparsity [103.62784587778037]
Recently, a couple of advanced convolutional models strike back with large kernels motivated by the local but large attention mechanism.
We propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51x51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers.
arXiv Detail & Related papers (2022-07-07T23:55:52Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs)
Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm.
We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv)
It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels.
XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.