Large Separable Kernel Attention: Rethinking the Large Kernel Attention
Design in CNN
- URL: http://arxiv.org/abs/2309.01439v3
- Date: Fri, 20 Oct 2023 03:28:16 GMT
- Title: Large Separable Kernel Attention: Rethinking the Large Kernel Attention
Design in CNN
- Authors: Kin Wai Lau, Lai-Man Po, Yasar Abbas Ur Rehman
- Abstract summary: We propose a family of Large Separable Kernel Attention modules, termed LSKA.
LSKA decomposes the 2D convolutional kernel of the depth-wise convolutional layer into cascaded horizontal and vertical 1-D kernels.
We demonstrate that the proposed LSKA design biases the VAN more toward the shape of the object than the texture with increasing kernel size.
- Score: 16.751500508997264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules
have been shown to provide remarkable performance, that surpasses Vision
Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise
convolutional layer in these LKA modules incurs a quadratic increase in the
computational and memory footprints with increasing convolutional kernel size.
To mitigate these problems and to enable the use of extremely large
convolutional kernels in the attention modules of VAN, we propose a family of
Large Separable Kernel Attention modules, termed LSKA. LSKA decomposes the 2D
convolutional kernel of the depth-wise convolutional layer into cascaded
horizontal and vertical 1-D kernels. In contrast to the standard LKA design,
the proposed decomposition enables the direct use of the depth-wise
convolutional layer with large kernels in the attention module, without
requiring any extra blocks. We demonstrate that the proposed LSKA module in VAN
can achieve comparable performance with the standard LKA module and incur lower
computational complexity and memory footprints. We also find that the proposed
LSKA design biases the VAN more toward the shape of the object than the texture
with increasing kernel size. Additionally, we benchmark the robustness of the
LKA and LSKA in VAN, ViTs, and the recent ConvNeXt on the five corrupted
versions of the ImageNet dataset that are largely unexplored in the previous
works. Our extensive experimental results show that the proposed LSKA module in
VAN provides a significant reduction in computational complexity and memory
footprints with increasing kernel size while outperforming ViTs, ConvNeXt, and
providing similar performance compared to the LKA module in VAN on object
recognition, object detection, semantic segmentation, and robustness tests.
Related papers
- Large coordinate kernel attention network for lightweight image super-resolution [5.66935513638074]
We propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field.
We also propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels.
arXiv Detail & Related papers (2024-05-15T14:03:38Z) - Enhancing Retinal Vascular Structure Segmentation in Images With a Novel
Design Two-Path Interactive Fusion Module Model [6.392575673488379]
We introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation.
Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning.
Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models.
arXiv Detail & Related papers (2024-03-03T01:36:11Z) - Accelerating Machine Learning Primitives on Commodity Hardware [0.0]
We present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix multiplication (GEMM) based convolution in Deep Neural Networks (DNNs)
Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators.
This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware.
arXiv Detail & Related papers (2023-10-08T16:26:18Z) - Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM)
This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature.
We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z) - Incorporating Transformer Designs into Convolutions for Lightweight
Image Super-Resolution [46.32359056424278]
Large convolutional kernels have become popular in designing convolutional neural networks.
The increase in kernel size also leads to a quadratic growth in the number of parameters, resulting in heavy computation and memory requirements.
We propose a neighborhood attention (NA) module that upgrades the standard convolution with a self-attention mechanism.
Building upon the NA module, we propose a lightweight single image super-resolution (SISR) network named TCSR.
arXiv Detail & Related papers (2023-03-25T01:32:18Z) - LKD-Net: Large Kernel Convolution Network for Single Image Dehazing [70.46392287128307]
We propose a novel Large Kernel Convolution Dehaze Block (LKD Block) consisting of the Decomposition deep-wise Large Kernel Convolution Block (DLKCB) and the Channel Enhanced Feed-forward Network (CEFN)
The designed DLKCB can split the deep-wise large kernel convolution into a smaller depth-wise convolution and a depth-wise dilated convolution without introducing massive parameters and computational overhead.
Our LKD-Net dramatically outperforms the Transformer-based method Dehamer with only 1.79% #Param and 48.9% FLOPs.
arXiv Detail & Related papers (2022-09-05T06:56:48Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral
Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI.
Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.