Is Attention Better Than Matrix Decomposition?
- URL: http://arxiv.org/abs/2109.04553v1
- Date: Thu, 9 Sep 2021 20:40:19 GMT
- Title: Is Attention Better Than Matrix Decomposition?
- Authors: Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, Zhouchen
Lin
- Abstract summary: We show that self-attention is not better than the matrix decomposition model for encoding long-distance dependencies.
We propose a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding.
Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context.
- Score: 58.813382406412195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As an essential ingredient of modern deep learning, attention mechanism,
especially self-attention, plays a vital role in the global correlation
discovery. However, is hand-crafted attention irreplaceable when modeling the
global context? Our intriguing finding is that self-attention is not better
than the matrix decomposition (MD) model developed 20 years ago regarding the
performance and computational cost for encoding the long-distance dependencies.
We model the global context issue as a low-rank recovery problem and show that
its optimization algorithms can help design global information blocks. This
paper then proposes a series of Hamburgers, in which we employ the optimization
algorithms for solving MDs to factorize the input representations into
sub-matrices and reconstruct a low-rank embedding. Hamburgers with different
MDs can perform favorably against the popular global context module
self-attention when carefully coping with gradients back-propagated through
MDs. Comprehensive experiments are conducted in the vision tasks where it is
crucial to learn the global context, including semantic segmentation and image
generation, demonstrating significant improvements over self-attention and its
variants.
Related papers
- Sharing Key Semantics in Transformer Makes Efficient Image Restoration [148.22790334216117]
Self-attention mechanism, a cornerstone of Vision Transformers (ViTs) tends to encompass all global cues, even those from semantically unrelated objects or regions.
We propose boosting Image Restoration's performance by sharing the key semantics via Transformer for IR (i.e., SemanIR) in this paper.
arXiv Detail & Related papers (2024-05-30T12:45:34Z) - AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation [4.618389486337933]
We propose AMMUNet, a UNet-based framework that employs multi-scale attention map merging.
The proposed AMMM effectively combines multi-scale attention maps into a unified representation using a fixed mask template.
We show that our approach achieves remarkable mean intersection over union (mIoU) scores of 75.48% on the Vaihingen dataset and an exceptional 77.90% on the Potsdam dataset.
arXiv Detail & Related papers (2024-04-20T15:23:15Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Advancing Volumetric Medical Image Segmentation via Global-Local Masked
Autoencoder [7.098796546778199]
Masked autoencoder (MAE) is a promising self-supervised pre-training technique.
GL-MAE is a simple yet effective self-supervised pre-training strategy.
arXiv Detail & Related papers (2023-06-15T07:32:10Z) - Semantics-Aware Dynamic Localization and Refinement for Referring Image
Segmentation [102.25240608024063]
Referring image segments an image from a language expression.
We develop an algorithm that shifts from being localization-centric to segmentation-language.
Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z) - Realtime Global Attention Network for Semantic Segmentation [4.061739586881057]
We propose an integrated global attention neural network (RGANet) for semantic segmentation.
The integration of these global attention modules into a hierarchy of transformations maintains an improved evaluation metric performance.
arXiv Detail & Related papers (2021-12-24T04:24:18Z) - Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks [33.07113523598028]
We propose Attention Pruning (AP), a framework that observes attention patterns in a fixed dataset and generates a global sparseness mask.
AP saves 90% of attention computation for language modeling and about 50% for machine translation and GLUE tasks, maintaining result quality.
arXiv Detail & Related papers (2020-11-20T13:58:21Z) - BayGo: Joint Bayesian Learning and Information-Aware Graph Optimization [48.30183416069897]
BayGo is a novel fully decentralized joint Bayesian learning and graph optimization framework.
We show that our framework achieves faster convergence and higher accuracy compared to fully-connected and star topology graphs.
arXiv Detail & Related papers (2020-11-09T11:16:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.