Related papers: E-ConvNeXt: A Lightweight and Efficient ConvNeXt Variant with Cross-Stage Partial Connections

E-ConvNeXt: A Lightweight and Efficient ConvNeXt Variant with Cross-Stage Partial Connections

URL: http://arxiv.org/abs/2508.20955v1
Date: Thu, 28 Aug 2025 16:17:19 GMT
Title: E-ConvNeXt: A Lightweight and Efficient ConvNeXt Variant with Cross-Stage Partial Connections
Authors: Fang Wang, Huitao Li, Wenhan Chao, Zheng Zhuo, Yiran Ji, Chang Peng, Yupeng Sun,
Abstract summary: E-ConvNeXt can maintain high accuracy performance under different complexity configurations.<n>E-ConvNeXt-mini reaches 78.3% Top-1 accuracy at 0.9GFLOPs; E-ConvNeXt-small reaches 81.9% Top-1 accuracy at 3.1GFLOPs.
Score: 4.207343875949465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many high-performance networks were not designed with lightweight application scenarios in mind from the outset, which has greatly restricted their scope of application. This paper takes ConvNeXt as the research object and significantly reduces the parameter scale and network complexity of ConvNeXt by integrating the Cross Stage Partial Connections mechanism and a series of optimized designs. The new network is named E-ConvNeXt, which can maintain high accuracy performance under different complexity configurations. The three core innovations of E-ConvNeXt are : (1) integrating the Cross Stage Partial Network (CSPNet) with ConvNeXt and adjusting the network structure, which reduces the model's network complexity by up to 80%; (2) Optimizing the Stem and Block structures to enhance the model's feature expression capability and operational efficiency; (3) Replacing Layer Scale with channel attention. Experimental validation on ImageNet classification demonstrates E-ConvNeXt's superior accuracy-efficiency balance: E-ConvNeXt-mini reaches 78.3% Top-1 accuracy at 0.9GFLOPs. E-ConvNeXt-small reaches 81.9% Top-1 accuracy at 3.1GFLOPs. Transfer learning tests on object detection tasks further confirm its generalization capability.

Related papers

EVCC: Enhanced Vision Transformer-ConvNeXt-CoAtNet Fusion for Classification [0.5394291557377919]
Hybrid vision architectures combining Transformers and CNNs have significantly advanced image classification, but they usually do so at significant computational cost.<n>We introduce EVCC, a novel multi-branch architecture integrating the Vision Transformer, lightweight ConvNeXt, and CoAtNet.<n> Experiments across the CIFAR-100, Tobacco3482, CelebA, and Brain Cancer datasets demonstrate EVCC's superiority over powerful models.
arXiv Detail & Related papers (2025-11-24T02:11:19Z)
FORTRESS: Function-composition Optimized Real-Time Resilient Structural Segmentation via Kolmogorov-Arnold Enhanced Spatial Attention Networks [1.663204995903499]
FORTRESS (Function-composition Optimized Real-Time Resilient Structural) is a new architecture that balances accuracy and speed by using a special method.<n>Fortress incorporates three key innovations: a systematic depthwise separable convolution framework, adaptive TiKAN integration, and multi-scale attention fusion.<n>The architecture achieves remarkable efficiency gains with 91% parameter reduction (31M to 2.9M), 91% computational complexity reduction (13.7 to 1.17 GFLOPs), and 3x inference speed improvement.
arXiv Detail & Related papers (2025-07-16T23:17:58Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition [63.93802691275012]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) to simultaneously learn global and local dynamics.<n>We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.<n>In the ImageNet-1K classification, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z)
Towards Simple and Accurate Human Pose Estimation with Stair Network [34.421529219040295]
We develop a small yet discrimicative model called STair Network, which can be stacked towards an accurate multi-stage pose estimation system. To reduce computational cost, STair Network is composed of novel basic feature extraction blocks. We demonstrate the effectiveness of the STair Network on two standard datasets.
arXiv Detail & Related papers (2022-02-18T10:37:13Z)
FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking. We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints. FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z)
MUXConv: Information Multiplexing in Convolutional Neural Networks [25.284420772533572]
MUXConv is designed to increase the flow of information by progressively multiplexing channel and spatial information in the network. On ImageNet, the resulting models, dubbed MUXNets, match the performance (75.3% top-1 accuracy) and multiply-add operations (218M) of MobileNetV3. MUXNet also performs well under transfer learning and when adapted to object detection.
arXiv Detail & Related papers (2020-03-31T00:09:47Z)
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost. We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts. We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z)
Toward fast and accurate human pose estimation via soft-gated skip connections [97.06882200076096]
This paper is on highly accurate and highly efficient human pose estimation. We re-analyze this design choice in the context of improving both the accuracy and the efficiency over the state-of-the-art. Our model achieves state-of-the-art results on the MPII and LSP datasets.
arXiv Detail & Related papers (2020-02-25T18:51:51Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.