Related papers: SGDM: Static-Guided Dynamic Module Make Stronger Visual Models

SGDM: Static-Guided Dynamic Module Make Stronger Visual Models

URL: http://arxiv.org/abs/2403.18282v1
Date: Wed, 27 Mar 2024 06:18:40 GMT
Title: SGDM: Static-Guided Dynamic Module Make Stronger Visual Models
Authors: Wenjie Xing, Zhenchao Cui, Jing Qi,
Abstract summary: spatial attention mechanism has been widely used to improve object detection performance. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution. We introduce the mechanism of shared weights in static convolution to solve the problem of dynamic convolution being sensitive to high-frequency noise.
Score: 0.9012198585960443
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The spatial attention mechanism has been widely used to improve object detection performance. However, its operation is currently limited to static convolutions lacking content-adaptive features. This paper innovatively approaches from the perspective of dynamic convolution. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution, making it hard to implement in spatial mechanism: 1) it is computation-heavy; 2) when generating weights, spatial information is disregarded. Firstly, by using Razor Operation to generate certain features, we vastly reduce the parameters of the entire dynamic convolution operation. Secondly, we added a spatial branch inside RDConv to generate convolutional kernel parameters with richer spatial information. Embedding dynamic convolution will also bring the problem of sensitivity to high-frequency noise. We propose the Static-Guided Dynamic Module (SGDM) to address this limitation. By using SGDM, we utilize a set of asymmetric static convolution kernel parameters to guide the construction of dynamic convolution. We introduce the mechanism of shared weights in static convolution to solve the problem of dynamic convolution being sensitive to high-frequency noise. Extensive experiments illustrate that multiple different object detection backbones equipped with SGDM achieve a highly competitive boost in performance(e.g., +4% mAP with YOLOv5n on VOC and +1.7% mAP with YOLOv8n on COCO) with negligible parameter increase(i.e., +0.33M on YOLOv5n and +0.19M on YOLOv8n).

Related papers

ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech. The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture. To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z)
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes [71.61083731844282]
We present DeSiRe-GS, a self-supervised gaussian splatting representation. It enables effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
arXiv Detail & Related papers (2024-11-18T05:49:16Z)
Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space [4.111899441919165]
Dynamic Mobile-Former maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators. PVT.A Transformer in Dynamic Mobile-Former only requires a few randomly calculate global features. Bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.
arXiv Detail & Related papers (2023-04-13T05:22:24Z)
Adaptive Dynamic Filtering Network for Image Denoising [8.61083713580388]
In image denoising networks, feature scaling is widely used to enlarge the receptive field size and reduce computational costs. We propose to employ dynamic convolution to improve the learning of high-frequency and multi-scale features. We build an efficient denoising network with the proposed DCB and MDCB, named ADFNet.
arXiv Detail & Related papers (2022-11-22T06:54:27Z)
PAD-Net: An Efficient Framework for Dynamic Networks [72.85480289152719]
Common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones. We propose a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones. Our method is comprehensively supported by large-scale experiments with two typical advanced dynamic architectures.
arXiv Detail & Related papers (2022-11-10T12:42:43Z)
Omni-Dimensional Dynamic Convolution [25.78940854339179]
Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs) Recent research in dynamic convolution shows that learning a linear combination of $n$ convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs. We present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design.
arXiv Detail & Related papers (2022-09-16T14:05:38Z)
SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution [16.56592303409295]
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase. We propose a new framework, textbfSparse Dynamic Convolution (textscSD-Conv), to naturally integrate these two paths.
arXiv Detail & Related papers (2022-04-05T14:03:54Z)
Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks [82.18396309806577]
We propose a novel activation quantizer, referred to as Dynamic Dual Trainable Bounds (DDTB) Our DDTB exhibits significant performance improvements in ultra-low precision. For example, our DDTB achieves a 0.70dB PSNR increase on Urban100 benchmark when quantizing EDSR to 2-bit and scaling up output images to x4.
arXiv Detail & Related papers (2022-03-08T04:26:18Z)
DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels. We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
Revisiting Dynamic Convolution via Matrix Decomposition [81.89967403872147]
We propose dynamic channel fusion to replace dynamic attention over channel groups. Our method is easier to train and requires significantly fewer parameters without sacrificing accuracy.
arXiv Detail & Related papers (2021-03-15T23:03:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.