Related papers: Revisiting Linformer with a modified self-attention with linear complexity

Revisiting Linformer with a modified self-attention with linear complexity

URL: http://arxiv.org/abs/2101.10277v1
Date: Wed, 16 Dec 2020 13:23:29 GMT
Title: Revisiting Linformer with a modified self-attention with linear complexity
Authors: Madhusudan Verma
Abstract summary: I propose an alternative method for self-attention with linear complexity in time and space. Since this method works for long sequences this can be used for images as well as audios.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Transformer models such as Google's BERT and OpenAI's GPT-3 are successful in many natural language processing tasks, training and deploying these models are costly and inefficient.Even if pre-trained models are used, deploying these models still remained a challenge due to their large size. Apart from deployment, these models take higher time during inference restricting user-friendliness. The main bottleneck is self-attention which uses quadratic time and space with respect to the sequence length. In order to reduce the quadratic time complexity of the self-attention mechanism, Linformer by Facebook's AI research team was introduced where they showed that the self-attention mechanism can be approximated by a low-rank matrix and exploiting this finding, a new method for self-attention with linear time and space complexity was proposed by them. In the Linformer, the time complexity depends on the projection mapping dimension which acts as a hyperparameter and affects the performance of the model, tuning this hyperparameter can be time-consuming. In this paper, I proposed an alternative method for self-attention with linear complexity in time and space and is independent of the projection mapping dimension. Since this method works for long sequences this can be used for images as well as audios.

Related papers

Diffusion Auto-regressive Transformer for Effective Self-supervised Time Series Forecasting [47.58016750718323]
We propose a novel generative self-supervised method called TimeDART. TimeDART captures both the global sequence dependence and local detail features within time series data. Our code is publicly available at https://github.com/Melmaphother/TimeDART.
arXiv Detail & Related papers (2024-10-08T06:08:33Z)
Linear Attention is Enough in Spatial-Temporal Forecasting [0.0]
We propose treating nodes in road networks at different time steps as independent spatial-temporal tokens. We then feed them into a vanilla Transformer to learn complex spatial-temporal patterns. Our code achieves state-of-the-art performance at an affordable computational cost.
arXiv Detail & Related papers (2024-08-17T10:06:50Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z)
DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation [3.9548535445908928]
We propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights.
arXiv Detail & Related papers (2022-12-27T14:39:39Z)
Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting--Full Version [50.43914511877446]
We propose a triangular, variable-specific attention to ensure high efficiency and accuracy. We show that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency.
arXiv Detail & Related papers (2022-04-28T20:41:49Z)
Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z)
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z)
Linformer: Self-Attention with Linear Complexity [36.5703957318311]
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. The standard self-attention mechanism of the Transformer uses $O(n2)$ time and space with respect to sequence length. We propose a new self-attention mechanism, which reduces the overall self-attention complexity from $O(n2)$ to $O(n)$ in both time and space.
arXiv Detail & Related papers (2020-06-08T17:37:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.