Revisiting Linformer with a modified self-attention with linear
complexity
- URL: http://arxiv.org/abs/2101.10277v1
- Date: Wed, 16 Dec 2020 13:23:29 GMT
- Title: Revisiting Linformer with a modified self-attention with linear
complexity
- Authors: Madhusudan Verma
- Abstract summary: I propose an alternative method for self-attention with linear complexity in time and space.
Since this method works for long sequences this can be used for images as well as audios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although Transformer models such as Google's BERT and OpenAI's GPT-3 are
successful in many natural language processing tasks, training and deploying
these models are costly and inefficient.Even if pre-trained models are used,
deploying these models still remained a challenge due to their large size.
Apart from deployment, these models take higher time during inference
restricting user-friendliness. The main bottleneck is self-attention which uses
quadratic time and space with respect to the sequence length. In order to
reduce the quadratic time complexity of the self-attention mechanism, Linformer
by Facebook's AI research team was introduced where they showed that the
self-attention mechanism can be approximated by a low-rank matrix and
exploiting this finding, a new method for self-attention with linear time and
space complexity was proposed by them. In the Linformer, the time complexity
depends on the projection mapping dimension which acts as a hyperparameter and
affects the performance of the model, tuning this hyperparameter can be
time-consuming. In this paper, I proposed an alternative method for
self-attention with linear complexity in time and space and is independent of
the projection mapping dimension. Since this method works for long sequences
this can be used for images as well as audios.
Related papers
- Diffusion Auto-regressive Transformer for Effective Self-supervised Time Series Forecasting [47.58016750718323]
We propose a novel generative self-supervised method called TimeDART.
TimeDART captures both the global sequence dependence and local detail features within time series data.
Our code is publicly available at https://github.com/Melmaphother/TimeDART.
arXiv Detail & Related papers (2024-10-08T06:08:33Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior.
Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map.
KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z) - DAE-Former: Dual Attention-guided Efficient Transformer for Medical
Image Segmentation [3.9548535445908928]
We propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism.
Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights.
arXiv Detail & Related papers (2022-12-27T14:39:39Z) - Triformer: Triangular, Variable-Specific Attentions for Long Sequence
Multivariate Time Series Forecasting--Full Version [50.43914511877446]
We propose a triangular, variable-specific attention to ensure high efficiency and accuracy.
We show that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency.
arXiv Detail & Related papers (2022-04-28T20:41:49Z) - Sketching as a Tool for Understanding and Accelerating Self-attention
for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules.
We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection.
Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z) - Linformer: Self-Attention with Linear Complexity [36.5703957318311]
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications.
The standard self-attention mechanism of the Transformer uses $O(n2)$ time and space with respect to sequence length.
We propose a new self-attention mechanism, which reduces the overall self-attention complexity from $O(n2)$ to $O(n)$ in both time and space.
arXiv Detail & Related papers (2020-06-08T17:37:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.