Related papers: Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space

URL: http://arxiv.org/abs/2304.07254v1
Date: Thu, 13 Apr 2023 05:22:24 GMT
Title: Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space
Authors: Seokju Yun, Youngmin Ro
Abstract summary: Dynamic Mobile-Former maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators. PVT.A Transformer in Dynamic Mobile-Former only requires a few randomly calculate global features. Bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.
Score: 4.111899441919165
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce Dynamic Mobile-Former(DMF), maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.Our Dynamic MobileFormer effectively utilizes the advantages of Dynamic MobileNet (MobileNet equipped with dynamic convolution) using global information from light-weight attention.A Transformer in Dynamic Mobile-Former only requires a few randomly initialized tokens to calculate global features, making it computationally efficient.And a bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.We also simplify the optimization process of vanilla dynamic convolution by splitting the convolution kernel into an input-agnostic kernel and an input-dependent kernel.This allows for optimization in a wider kernel space, resulting in enhanced capacity.By integrating lightweight attention and enhanced dynamic convolution, our Dynamic Mobile-Former achieves not only high efficiency, but also strong performance.We benchmark the Dynamic Mobile-Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection, and instanace segmentation.For example, our DMF hits the top-1 accuracy of 79.4% on ImageNet-1K, much higher than PVT-Tiny by 4.3% with only 1/4 FLOPs.Additionally,our proposed DMF-S model performed well on challenging vision datasets such as COCO, achieving a 39.0% mAP,which is 1% higher than that of the Mobile-Former 508M model, despite using 3 GFLOPs less computations.Code and models are available at https://github.com/ysj9909/DMF

Related papers

iFormer: Integrating ConvNet and Transformer for Mobile Application [0.6798775532273751]
iFormer integrates the fast local representation capacity of convolution with the efficient global modeling ability of self-attention. We conduct comprehensive experiments demonstrating that iFormer outperforms existing lightweight networks across various tasks.
arXiv Detail & Related papers (2025-01-26T02:34:58Z)
EMOv2: Pushing 5M Vision Model Frontier [92.21687467702972]
We set up the new frontier of the 5M magnitude lightweight model on various downstream tasks. Our work rethinks the lightweight infrastructure of efficient IRB and practical components in Transformer. Considering the imperceptible latency for mobile users when downloading models under 4G/5G bandwidth, we investigate the performance upper limit of lightweight models with a magnitude of 5M.
arXiv Detail & Related papers (2024-12-09T17:12:22Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. We propose the MobileMamba framework, which balances efficiency and performance. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [73.80247057590519]
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. We introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers, to achieve a balance between efficiency and performance in mobile applications. Our model achieves 83.0%/84.1% top-1 with only 12M/21M parameters on ImageNet-1K.
arXiv Detail & Related papers (2024-08-07T11:33:46Z)
KernelWarehouse: Rethinking the Design of Dynamic Convolution [16.101179962553385]
KernelWarehouse redefines the basic concepts of Kernels", assembling kernels" and attention function" We testify the effectiveness of KernelWarehouse on ImageNet and MS-COCO datasets using various ConvNet architectures.
arXiv Detail & Related papers (2024-06-12T05:16:26Z)
Efficient Modulation for Vision Networks [122.1051910402034]
We propose efficient modulation, a novel design for efficient vision networks. We demonstrate that the modulation mechanism is particularly well suited for efficient networks. Our network can accomplish better trade-offs between accuracy and efficiency.
arXiv Detail & Related papers (2024-03-29T03:48:35Z)
SGDM: Static-Guided Dynamic Module Make Stronger Visual Models [0.9012198585960443]
spatial attention mechanism has been widely used to improve object detection performance. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution. We introduce the mechanism of shared weights in static convolution to solve the problem of dynamic convolution being sensitive to high-frequency noise.
arXiv Detail & Related papers (2024-03-27T06:18:40Z)
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications [108.44482683870888]
We introduce Deformable Convolution v4 (DCNv4), a highly efficient and effective operator designed for a broad spectrum of vision applications. DCNv4 addresses the limitations of its predecessor, DCNv3, with two key enhancements. It demonstrates exceptional performance across various tasks, including image classification, instance and semantic segmentation, and notably, image generation.
arXiv Detail & Related papers (2024-01-11T14:53:24Z)
Vision Transformer Computation and Resilience for Dynamic Inference [3.6929360462568077]
We leverage the resilience of vision transformers to pruning and switch between different scaled versions of a model. Most FLOPs are generated by convolutions, not attention. Some models are fairly resilient and their model execution can be adapted without retraining.
arXiv Detail & Related papers (2022-12-06T01:10:31Z)
PAD-Net: An Efficient Framework for Dynamic Networks [72.85480289152719]
Common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones. We propose a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones. Our method is comprehensively supported by large-scale experiments with two typical advanced dynamic architectures.
arXiv Detail & Related papers (2022-11-10T12:42:43Z)
SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution [16.56592303409295]
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase. We propose a new framework, textbfSparse Dynamic Convolution (textscSD-Conv), to naturally integrate these two paths.
arXiv Detail & Related papers (2022-04-05T14:03:54Z)
DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels. We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z)
Revisiting Dynamic Convolution via Matrix Decomposition [81.89967403872147]
We propose dynamic channel fusion to replace dynamic attention over channel groups. Our method is easier to train and requires significantly fewer parameters without sacrificing accuracy.
arXiv Detail & Related papers (2021-03-15T23:03:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.