MogaNet: Multi-order Gated Aggregation Network
- URL: http://arxiv.org/abs/2211.03295v3
- Date: Fri, 16 Feb 2024 14:17:23 GMT
- Title: MogaNet: Multi-order Gated Aggregation Network
- Authors: Siyuan Li, Zedong Wang, Zicheng Liu, Cheng Tan, Haitao Lin, Di Wu,
Zhiyuan Chen, Jiangbin Zheng, Stan Z. Li
- Abstract summary: We propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning.
MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module.
MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet.
- Score: 64.16774341908365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By contextualizing the kernel as global as possible, Modern ConvNets have
shown great potential in computer vision tasks. However, recent progress on
\textit{multi-order game-theoretic interaction} within deep neural networks
(DNNs) reveals the representation bottleneck of modern ConvNets, where the
expressive interactions have not been effectively encoded with the increased
kernel size. To tackle this challenge, we propose a new family of modern
ConvNets, dubbed MogaNet, for discriminative visual representation learning in
pure ConvNet-based models with favorable complexity-performance trade-offs.
MogaNet encapsulates conceptually simple yet effective convolutions and gated
aggregation into a compact module, where discriminative features are
efficiently gathered and contextualized adaptively. MogaNet exhibits great
scalability, impressive efficiency of parameters, and competitive performance
compared to state-of-the-art ViTs and ConvNets on ImageNet and various
downstream vision benchmarks, including COCO object detection, ADE20K semantic
segmentation, 2D\&3D human pose estimation, and video prediction. Notably,
MogaNet hits 80.0\% and 87.8\% accuracy with 5.2M and 181M parameters on
ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59\% FLOPs and
17M parameters, respectively. The source code is available at
\url{https://github.com/Westlake-AI/MogaNet}.
Related papers
- Designing Concise ConvNets with Columnar Stages [33.248031676529635]
We introduce a refreshing ConvNet macro design called Columnar Stage Network (CoSNet)
CoSNet has a systematically developed simple and concise structure, smaller depth, low parameter count, low FLOPs, and attention-less operations.
Our evaluations show that CoSNet rivals many renowned ConvNets and Transformer designs under resource-constrained scenarios.
arXiv Detail & Related papers (2024-10-05T09:03:42Z) - DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation [8.240211805240023]
We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs)
We propose DSNet, a Dual-Branch CNN architecture, which incorporates atrous convolutions in the shallow layers of the model architecture.
Our models achieve a new state-of-the-art trade-off between accuracy and speed on ADE20K, Cityscapes and BDD datasets.
arXiv Detail & Related papers (2024-06-06T02:51:57Z) - UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition [61.01408259741114]
We propose four architectural guidelines for designing large- Kernel-based convolutional neural networks (ConvNets)
Our proposed large- Kernel-based ConvNet shows leading performance in image recognition.
We discover large kernels are the key to unlocking the exceptional performance of ConvNets in domains where they were originally not proficient.
arXiv Detail & Related papers (2023-11-27T07:48:50Z) - Are Large Kernels Better Teachers than Transformers for ConvNets? [82.4742785108714]
This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small- Kernel ConvNets.
arXiv Detail & Related papers (2023-05-30T21:05:23Z) - ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer.
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z) - PSO-Convolutional Neural Networks with Heterogeneous Learning Rate [4.243356707599486]
Convolutional Neural Networks (ConvNets or CNNs) have been candidly deployed in the scope of computer vision and related fields.
In this article, we propose a novel Particle Swarm Optimization (PSO) based training for ConvNets.
In such framework, the vector of weights of each ConvNet is intertwine as a particle in phase space whereby PSO dynamicss with Gradient Descent (SGD) in order to boost training performance and generalization.
arXiv Detail & Related papers (2022-05-20T22:47:19Z) - Bottleneck Transformers for Visual Recognition [97.16013761605254]
We present BoTNet, a powerful backbone architecture that incorporates self-attention for vision tasks.
We present models that achieve a strong performance of 84.7% top-1 accuracy on the ImageNet benchmark.
We hope our simple and effective approach will serve as a strong baseline for future research in self-attention models for vision.
arXiv Detail & Related papers (2021-01-27T18:55:27Z) - Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [65.28292822614418]
Giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks.
This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
arXiv Detail & Related papers (2020-10-28T08:49:45Z) - DRU-net: An Efficient Deep Convolutional Neural Network for Medical
Image Segmentation [2.3574651879602215]
Residual network (ResNet) and densely connected network (DenseNet) have significantly improved the training efficiency and performance of deep convolutional neural networks (DCNNs)
We propose an efficient network architecture by considering advantages of both networks.
arXiv Detail & Related papers (2020-04-28T12:16:24Z) - DyNet: Dynamic Convolution for Accelerating Convolutional Neural
Networks [16.169176006544436]
We propose a novel dynamic convolution method to adaptively generate convolution kernels based on image contents.
Based on the architecture MobileNetV3-Small/Large, DyNet achieves 70.3/77.1% Top-1 accuracy on ImageNet with an improvement of 2.9/1.9%.
arXiv Detail & Related papers (2020-04-22T16:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.