Sketching as a Tool for Understanding and Accelerating Self-attention
for Long Sequences
- URL: http://arxiv.org/abs/2112.05359v1
- Date: Fri, 10 Dec 2021 06:58:05 GMT
- Title: Sketching as a Tool for Understanding and Accelerating Self-attention
for Long Sequences
- Authors: Yifan Chen, Qi Zeng, Dilek Hakkani-Tur, Di Jin, Heng Ji, Yun Yang
- Abstract summary: Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules.
We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection.
Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
- Score: 52.6022911513076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based models are not efficient in processing long sequences due
to the quadratic space and time complexity of the self-attention modules. To
address this limitation, Linformer and Informer are proposed to reduce the
quadratic complexity to linear (modulo logarithmic factors) via low-dimensional
projection and row selection respectively. These two models are intrinsically
connected, and to understand their connection, we introduce a theoretical
framework of matrix sketching. Based on the theoretical analysis, we propose
Skeinformer to accelerate self-attention and further improve the accuracy of
matrix approximation to self-attention with three carefully designed
components: column sampling, adaptive row normalization and pilot sampling
reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate
that our methods outperform alternatives with a consistently smaller time/space
footprint.
Related papers
- Fast Dual-Regularized Autoencoder for Sparse Biological Data [65.268245109828]
We develop a shallow autoencoder for the dual neighborhood-regularized matrix completion problem.
We demonstrate the speed and accuracy advantage of our approach over the existing state-of-the-art in predicting drug-target interactions and drug-disease associations.
arXiv Detail & Related papers (2024-01-30T01:28:48Z) - Automatic dimensionality reduction of Twin-in-the-Loop Observers [1.6877390079162282]
This paper aims to find a procedure to tune the high-complexity observer by lowering its dimensionality.
The strategies have been validated for speed and yaw-rate estimation on real-world data.
arXiv Detail & Related papers (2024-01-18T10:14:21Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Efficient Interpretable Nonlinear Modeling for Multiple Time Series [5.448070998907116]
This paper proposes an efficient nonlinear modeling approach for multiple time series.
It incorporates nonlinear interactions among different time-series variables.
Experimental results show that the proposed algorithm improves the identification of the support of the VAR coefficients in a parsimonious manner.
arXiv Detail & Related papers (2023-09-29T11:42:59Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Triformer: Triangular, Variable-Specific Attentions for Long Sequence
Multivariate Time Series Forecasting--Full Version [50.43914511877446]
We propose a triangular, variable-specific attention to ensure high efficiency and accuracy.
We show that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency.
arXiv Detail & Related papers (2022-04-28T20:41:49Z) - Predicting Attention Sparsity in Transformers [0.9786690381850356]
We propose Sparsefinder, a model trained to identify the sparsity pattern of entmax attention before computing it.
Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph.
arXiv Detail & Related papers (2021-09-24T20:51:21Z) - Revisiting Linformer with a modified self-attention with linear
complexity [0.0]
I propose an alternative method for self-attention with linear complexity in time and space.
Since this method works for long sequences this can be used for images as well as audios.
arXiv Detail & Related papers (2020-12-16T13:23:29Z) - Multi-Objective Matrix Normalization for Fine-grained Visual Recognition [153.49014114484424]
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC)
Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features.
We propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation.
arXiv Detail & Related papers (2020-03-30T08:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.