Partial Convolution Meets Visual Attention
- URL: http://arxiv.org/abs/2503.03148v1
- Date: Wed, 05 Mar 2025 03:42:59 GMT
- Title: Partial Convolution Meets Visual Attention
- Authors: Haiduo Huang, Fuwei Yang, Dong Li, Ji Liu, Lu Tian, Jinzhang Peng, Pengju Ren, Emad Barsoum,
- Abstract summary: FasterNet attempts to introduce partial convolution but compromises the accuracy due to underutilized channels.<n>We introduce a novel Partial visual ATtention mechanism (PAT) that can efficiently combine PConv with visual attention.<n>Building upon PAT, we propose a novel hybrid network family, named PATNet, which achieves superior top-1 accuracy and inference speed.
- Score: 16.646684431676665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing an efficient and effective neural network has remained a prominent topic in computer vision research. Depthwise onvolution (DWConv) is widely used in efficient CNNs or ViTs, but it needs frequent memory access during inference, which leads to low throughput. FasterNet attempts to introduce partial convolution (PConv) as an alternative to DWConv but compromises the accuracy due to underutilized channels. To remedy this shortcoming and consider the redundancy between feature map channels, we introduce a novel Partial visual ATtention mechanism (PAT) that can efficiently combine PConv with visual attention. Our exploration indicates that the partial attention mechanism can completely replace the full attention mechanism and reduce model parameters and FLOPs. Our PAT can derive three types of blocks: Partial Channel-Attention block (PAT_ch), Partial Spatial-Attention block (PAT_sp) and Partial Self-Attention block (PAT_sf). First, PAT_ch integrates the enhanced Gaussian channel attention mechanism to infuse global distribution information into the untouched channels of PConv. Second, we introduce the spatial-wise attention to the MLP layer to further improve model accuracy. Finally, we replace PAT_ch in the last stage with the self-attention mechanism to extend the global receptive field. Building upon PAT, we propose a novel hybrid network family, named PATNet, which achieves superior top-1 accuracy and inference speed compared to FasterNet on ImageNet-1K classification and excel in both detection and segmentation on the COCO dataset. Particularly, our PATNet-T2 achieves 1.3% higher accuracy than FasterNet-T2, while exhibiting 25% higher GPU throughput and 24% lower CPU latency.
Related papers
- OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels [50.42092879252807]
We propose a novel pure ConvNet vision backbone, termed OverLoCK, which is carefully devised from both the architecture and mixer perspectives.<n>Specifically, we introduce a biomimetic Deep-stage Decomposition Strategy (DDS) that fuses semantically meaningful context representations into middle and deep layers.<n>To fully unleash the power of top-down context guidance, we further propose a novel textbfContext-textbfMixing Dynamic Convolution (ContMix)<n>Our OverLoCK exhibits notable performance improvement over existing methods.
arXiv Detail & Related papers (2025-02-27T13:45:15Z) - Partial Channel Network: Compute Fewer, Perform Better [6.666628122653455]
We propose a new partial channel mechanism (PCM) to exploit redundancy within feature map channels.<n>We introduce a novel partial attention convolution (PATConv) that can efficiently combine convolution with visual attention.<n>Building on PATConv and DPConv, we propose a new hybrid network family, named PartialNet, which achieves superior top-1 accuracy and inference speed.
arXiv Detail & Related papers (2025-02-03T12:26:55Z) - SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks.
Much less research has been devoted to the channel mixer or feature mixing block (FFN or)
We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z) - Dual Path Transformer with Partition Attention [26.718318398951933]
We present a novel attention mechanism, called dual attention, which is both efficient and effective.
We evaluate the effectiveness of our model on several computer vision tasks, including image classification on ImageNet, object detection on COCO, and semantic segmentation on Cityscapes.
The proposed DualFormer-XS achieves 81.5% top-1 accuracy on ImageNet, outperforming the recent state-of-the-artiT-XS by 0.6% top-1 accuracy with much higher throughput.
arXiv Detail & Related papers (2023-05-24T06:17:53Z) - Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech
Attacks [67.7648985513978]
Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks.
We present a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism.
arXiv Detail & Related papers (2021-07-19T12:27:40Z) - CT-Net: Channel Tensorization Network for Video Classification [48.4482794950675]
3D convolution is powerful for video classification but often computationally expensive.
Most approaches fail to achieve a preferable balance between convolutional efficiency and feature-interaction sufficiency.
We propose a concise and novel Channelization Network (CT-Net)
Our CT-Net outperforms a number of recent SOTA approaches, in terms of accuracy and/or efficiency.
arXiv Detail & Related papers (2021-06-03T05:35:43Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - MicroNet: Towards Image Recognition with Extremely Low FLOPs [117.96848315180407]
MicroNet is an efficient convolutional neural network using extremely low computational cost.
A family of MicroNets achieve a significant performance gain over the state-of-the-art in the low FLOP regime.
For instance, MicroNet-M1 achieves 61.1% top-1 accuracy on ImageNet classification with 12 MFLOPs, outperforming MobileNetV3 by 11.3%.
arXiv Detail & Related papers (2020-11-24T18:59:39Z) - ULSAM: Ultra-Lightweight Subspace Attention Module for Compact
Convolutional Neural Networks [4.143032261649983]
"Ultra-Lightweight Subspace Attention Mechanism" (ULSAM) is end-to-end trainable and can be deployed as a plug-and-play module in compact convolutional neural networks (CNNs)
We achieve $approx$13% and $approx$25% reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27% and more than 1% improvement in top-1 accuracy on the ImageNet-1K and fine-grained image classification datasets (respectively)
arXiv Detail & Related papers (2020-06-26T17:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.