UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
- URL: http://arxiv.org/abs/2508.09000v1
- Date: Tue, 12 Aug 2025 15:11:18 GMT
- Title: UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
- Authors: Yuhao Wang, Wei Xi,
- Abstract summary: We propose a universal model for ConvNet of any scale, termed UniConvNet.<n>Experiments on ImageNet-1K, COCO 2017, and ADE20K demonstrate that UniConvNet outperforms state-of-the-art CNNs and ViTs.<n>UniConvNet-T achieves $84.2%$ ImageNet top-1 accuracy with $30M$ parameters and $5.1G$ FLOPs.
- Score: 6.1062169762251255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (ConvNets) with large effective receptive field (ERF), still in their early stages, have demonstrated promising effectiveness while constrained by high parameters and FLOPs costs and disrupted asymptotically Gaussian distribution (AGD) of ERF. This paper proposes an alternative paradigm: rather than merely employing extremely large ERF, it is more effective and efficient to expand the ERF while maintaining AGD of ERF by proper combination of smaller kernels, such as $7\times{7}$, $9\times{9}$, $11\times{11}$. This paper introduces a Three-layer Receptive Field Aggregator and designs a Layer Operator as the fundamental operator from the perspective of receptive field. The ERF can be expanded to the level of existing large-kernel ConvNets through the stack of proposed modules while maintaining AGD of ERF. Using these designs, we propose a universal model for ConvNet of any scale, termed UniConvNet. Extensive experiments on ImageNet-1K, COCO2017, and ADE20K demonstrate that UniConvNet outperforms state-of-the-art CNNs and ViTs across various vision recognition tasks for both lightweight and large-scale models with comparable throughput. Surprisingly, UniConvNet-T achieves $84.2\%$ ImageNet top-1 accuracy with $30M$ parameters and $5.1G$ FLOPs. UniConvNet-XL also shows competitive scalability to big data and large models, acquiring $88.4\%$ top-1 accuracy on ImageNet. Code and models are publicly available at https://github.com/ai-paperwithcode/UniConvNet.
Related papers
- Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations [17.41381592056492]
This paper proposes the paradigm of large convolutional kernels in designing modern Convolutional Neural Networks (ConvNets)
We establish that employing a few large kernels, instead of stacking multiple smaller ones, can be a superior design strategy.
We propose the UniRepLKNet architecture, which offers systematical architecture design principles specifically crafted for large- Kernel ConvNets.
arXiv Detail & Related papers (2024-10-10T15:43:55Z) - Demystifying the Effect of Receptive Field Size in U-Net Models for Medical Image Segmentation [0.0]
This work explores the understudied aspect of receptive field (RF) size and its impact on the U-Net and Attention U-Net architectures.
The results demonstrate that there exists an optimal TRF size that successfully strikes a balance between capturing a wider global context and maintaining computational efficiency.
A tool is also developed that calculates the TRF for a U-Net (and Attention U-Net) model, and also suggest an appropriate TRF size for a given model and dataset.
arXiv Detail & Related papers (2024-06-24T15:04:14Z) - Fully $1\times1$ Convolutional Network for Lightweight Image
Super-Resolution [79.04007257606862]
Deep models have significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3times3$ or more)
$1times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations.
We propose a simple yet effective fully $1times1$ convolutional network, named Shift-Conv-based Network (SCNet)
arXiv Detail & Related papers (2023-07-30T06:24:03Z) - GMConv: Modulating Effective Receptive Fields for Convolutional Kernels [52.50351140755224]
In convolutional neural networks, the convolutions are performed using a square kernel with a fixed N $times$ N receptive field (RF)
Inspired by the property that ERFs typically exhibit a Gaussian distribution, we propose a Gaussian Mask convolutional kernel (GMConv) in this work.
Our GMConv can directly replace the standard convolutions in existing CNNs and can be easily trained end-to-end by standard back-propagation.
arXiv Detail & Related papers (2023-02-09T10:17:17Z) - InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z) - MogaNet: Multi-order Gated Aggregation Network [61.842116053929736]
We propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning.<n>MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module.<n>MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet.
arXiv Detail & Related papers (2022-11-07T04:31:17Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Focal Modulation Networks [105.93086472906765]
Self-attention (SA) is completely replaced by focal modulation network (FocalNet)
FocalNets with tiny and base sizes achieve 82.3% and 83.9% top-1 accuracy on ImageNet-1K.
FocalNets exhibit remarkable superiority when transferred to downstream tasks.
arXiv Detail & Related papers (2022-03-22T17:54:50Z) - Pruning of Convolutional Neural Networks Using Ising Energy Model [45.4796383952516]
We propose an Ising energy model within an optimization framework for pruning convolutional kernels and hidden units.
Our experiments using ResNets, AlexNet, and SqueezeNet on CIFAR-10 and CIFAR-100 datasets show that the proposed method on average can achieve a pruning rate of more than $50%$ of the trainable parameters.
arXiv Detail & Related papers (2021-02-10T14:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.