FsaNet: Frequency Self-attention for Semantic Segmentation
- URL: http://arxiv.org/abs/2211.15595v3
- Date: Wed, 26 Jul 2023 08:50:12 GMT
- Title: FsaNet: Frequency Self-attention for Semantic Segmentation
- Authors: Fengyu Zhang, Ashkan Panahi, Guangjun Gao
- Abstract summary: We propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate.
By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency.
We show that frequency self-attention requires $87.29% sim 90.04%$ less memory, $96.13% sim 98.07%$ less FLOPs, and $97.56% sim 98.18%$ in run time.
- Score: 5.495952636982018
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Considering the spectral properties of images, we propose a new
self-attention mechanism with highly reduced computational complexity, up to a
linear rate. To better preserve edges while promoting similarity within
objects, we propose individualized processes over different frequency bands. In
particular, we study a case where the process is merely over low-frequency
components. By ablation study, we show that low frequency self-attention can
achieve very close or better performance relative to full frequency even
without retraining the network. Accordingly, we design and embed novel
plug-and-play modules to the head of a CNN network that we refer to as FsaNet.
The frequency self-attention 1) requires only a few low frequency coefficients
as input, 2) can be mathematically equivalent to spatial domain self-attention
with linear structures, 3) simplifies token mapping ($1\times1$ convolution)
stage and token mixing stage simultaneously. We show that frequency
self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim
98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular
self-attention. Compared to other ResNet101-based self-attention networks,
\ourM achieves a new \sArt result ($83.0\%$ mIoU) on Cityscape test dataset and
competitive results on ADE20k and VOCaug. \ourM can also enhance MASK R-CNN for
instance segmentation on COCO. In addition, utilizing the proposed module,
Segformer can be boosted on a series of models with different scales, and
Segformer-B5 can be improved even without retraining. Code is accessible at
\url{https://github.com/zfy-csu/FsaNet
Related papers
- Attention as an RNN [66.5420926480473]
We show that attention can be viewed as a special Recurrent Neural Network (RNN) with the ability to compute its textitmany-to-one RNN output efficiently.
We introduce a new efficient method of computing attention's textitmany-to-many RNN output based on the parallel prefix scan algorithm.
We show Aarens achieve comparable performance to Transformers on $38$ datasets spread across four popular sequential problem settings.
arXiv Detail & Related papers (2024-05-22T19:45:01Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Recasting Self-Attention with Holographic Reduced Representations [31.89878931813593]
Motivated by problems in malware detection, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR)
We obtain several benefits including $mathcalO(T H log H)$ time complexity, $mathcalO(T H)$ space complexity, and convergence in $10times$ fewer epochs.
Our Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer.
arXiv Detail & Related papers (2023-05-31T03:42:38Z) - SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric
Kernels [69.47358238222586]
Toeplitz Neural Networks (TNNs) are a recent sequence model with impressive results.
We aim to reduce O(n) computational complexity and O(n) relative positional encoder (RPE) multi-layer perceptron (MLP) and decay bias calls.
For bidirectional models, this motivates a sparse plus low-rank Toeplitz matrix decomposition.
arXiv Detail & Related papers (2023-05-15T21:25:35Z) - Parameterization of Cross-Token Relations with Relative Positional
Encoding for Vision MLP [52.25478388220691]
Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks.
They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.
We propose a new positional spacial gating unit (PoSGU) to efficiently encode the cross-token relations for token mixing.
arXiv Detail & Related papers (2022-07-15T04:18:06Z) - Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
We propose Combiner, which provides full attention capability in each attention head while maintaining low computation complexity.
We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention.
An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach.
arXiv Detail & Related papers (2021-07-12T22:43:11Z) - SMYRF: Efficient Attention using Asymmetric Clustering [103.47647577048782]
We propose a novel type of balanced clustering algorithm to approximate attention.
SMYRF can be used as a drop-in replacement for dense attention layers without any retraining.
We show that SMYRF can be used interchangeably with dense attention before and after training.
arXiv Detail & Related papers (2020-10-11T18:49:17Z) - Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering [8.103294902922036]
We consider the design of two-pass voice trigger detection systems.
We focus on the networks in the second pass that are used to re-score candidate segments.
arXiv Detail & Related papers (2020-08-05T19:16:33Z) - ULSAM: Ultra-Lightweight Subspace Attention Module for Compact
Convolutional Neural Networks [4.143032261649983]
"Ultra-Lightweight Subspace Attention Mechanism" (ULSAM) is end-to-end trainable and can be deployed as a plug-and-play module in compact convolutional neural networks (CNNs)
We achieve $approx$13% and $approx$25% reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27% and more than 1% improvement in top-1 accuracy on the ImageNet-1K and fine-grained image classification datasets (respectively)
arXiv Detail & Related papers (2020-06-26T17:05:43Z) - Efficient Content-Based Sparse Attention with Routing Transformers [34.83683983648021]
Self-attention suffers from quadratic compute and memory requirements with respect to sequence length.
Our work proposes to learn dynamic sparse attention patterns that avoid allocating and memory to attend to content unrelated to the query of interest.
arXiv Detail & Related papers (2020-03-12T19:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.