Related papers: Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

URL: http://arxiv.org/abs/2309.01439v3
Date: Fri, 20 Oct 2023 03:28:16 GMT
Title: Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN
Authors: Kin Wai Lau, Lai-Man Po, Yasar Abbas Ur Rehman
Abstract summary: We propose a family of Large Separable Kernel Attention modules, termed LSKA. LSKA decomposes the 2D convolutional kernel of the depth-wise convolutional layer into cascaded horizontal and vertical 1-D kernels. We demonstrate that the proposed LSKA design biases the VAN more toward the shape of the object than the texture with increasing kernel size.
Score: 16.751500508997264
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA modules incurs a quadratic increase in the computational and memory footprints with increasing convolutional kernel size. To mitigate these problems and to enable the use of extremely large convolutional kernels in the attention modules of VAN, we propose a family of Large Separable Kernel Attention modules, termed LSKA. LSKA decomposes the 2D convolutional kernel of the depth-wise convolutional layer into cascaded horizontal and vertical 1-D kernels. In contrast to the standard LKA design, the proposed decomposition enables the direct use of the depth-wise convolutional layer with large kernels in the attention module, without requiring any extra blocks. We demonstrate that the proposed LSKA module in VAN can achieve comparable performance with the standard LKA module and incur lower computational complexity and memory footprints. We also find that the proposed LSKA design biases the VAN more toward the shape of the object than the texture with increasing kernel size. Additionally, we benchmark the robustness of the LKA and LSKA in VAN, ViTs, and the recent ConvNeXt on the five corrupted versions of the ImageNet dataset that are largely unexplored in the previous works. Our extensive experimental results show that the proposed LSKA module in VAN provides a significant reduction in computational complexity and memory footprints with increasing kernel size while outperforming ViTs, ConvNeXt, and providing similar performance compared to the LKA module in VAN on object recognition, object detection, semantic segmentation, and robustness tests.

Related papers

RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation [51.37553739930992]
RPCANet++ is a sparse object segmentation framework that fuses the interpretability of RPCA with efficient deep architectures.<n>Our approach unfolds a relaxed RPCA model into a structured network comprising a Background Approximation Module (BAM), an Object Extraction Module (OEM) and an Image Restoration Module (IRM)<n>Experiments on diverse datasets demonstrate that RPCANet++ achieves state-of-the-art performance under various imaging scenarios.
arXiv Detail & Related papers (2025-08-06T08:19:37Z)
SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks [50.97089872043121]
We propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet.<n>We extend the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder.<n>Our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs.
arXiv Detail & Related papers (2025-08-05T15:36:13Z)
Efficient Star Distillation Attention Network for Lightweight Image Super-Resolution [14.729612888730404]
We propose a Star Distillation Module (SDM) to enhance discriminative representation learning via information distillation in the HDNL feature spaces.<n>Besides, we present a Multi-shape Multi-scale Large Kernel Attention (MM-LKA) module to learn representative long-range dependencies.<n>Our SDAN with low model complexity yields superior performance quantitatively and visually.
arXiv Detail & Related papers (2025-06-14T12:24:15Z)
Large coordinate kernel attention network for lightweight image super-resolution [5.66935513638074]
We propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field. We also propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels.
arXiv Detail & Related papers (2024-05-15T14:03:38Z)
Enhancing Retinal Vascular Structure Segmentation in Images With a Novel Design Two-Path Interactive Fusion Module Model [6.392575673488379]
We introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation. Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning. Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models.
arXiv Detail & Related papers (2024-03-03T01:36:11Z)
Accelerating Machine Learning Primitives on Commodity Hardware [0.0]
We present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix multiplication (GEMM) based convolution in Deep Neural Networks (DNNs) Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators. This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware.
arXiv Detail & Related papers (2023-10-08T16:26:18Z)
Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM) This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature. We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z)
Incorporating Transformer Designs into Convolutions for Lightweight Image Super-Resolution [46.32359056424278]
Large convolutional kernels have become popular in designing convolutional neural networks. The increase in kernel size also leads to a quadratic growth in the number of parameters, resulting in heavy computation and memory requirements. We propose a neighborhood attention (NA) module that upgrades the standard convolution with a self-attention mechanism. Building upon the NA module, we propose a lightweight single image super-resolution (SISR) network named TCSR.
arXiv Detail & Related papers (2023-03-25T01:32:18Z)
LKD-Net: Large Kernel Convolution Network for Single Image Dehazing [70.46392287128307]
We propose a novel Large Kernel Convolution Dehaze Block (LKD Block) consisting of the Decomposition deep-wise Large Kernel Convolution Block (DLKCB) and the Channel Enhanced Feed-forward Network (CEFN) The designed DLKCB can split the deep-wise large kernel convolution into a smaller depth-wise convolution and a depth-wise dilated convolution without introducing massive parameters and computational overhead. Our LKD-Net dramatically outperforms the Transformer-based method Dehamer with only 1.79% #Param and 48.9% FLOPs.
arXiv Detail & Related papers (2022-09-05T06:56:48Z)
Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS) Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage. We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI. Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.