Related papers: Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

URL: http://arxiv.org/abs/2112.05359v1
Date: Fri, 10 Dec 2021 06:58:05 GMT
Title: Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences
Authors: Yifan Chen, Qi Zeng, Dilek Hakkani-Tur, Di Jin, Heng Ji, Yun Yang
Abstract summary: Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
Score: 52.6022911513076
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with three carefully designed components: column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

Related papers

AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction [36.239648954658534]
Time series forecasting requires architectures that simultaneously achieve three competing objectives.<n>We introduce AutoHFormer, a hierarchical autoregressive transformer that addresses these challenges.<n> Comprehensive experiments demonstrate that AutoHFormer 10.76X faster training and 6.06X memory reduction compared to PatchTST on P08.
arXiv Detail & Related papers (2025-06-19T03:47:04Z)
Fast Dual-Regularized Autoencoder for Sparse Biological Data [65.268245109828]
We develop a shallow autoencoder for the dual neighborhood-regularized matrix completion problem. We demonstrate the speed and accuracy advantage of our approach over the existing state-of-the-art in predicting drug-target interactions and drug-disease associations.
arXiv Detail & Related papers (2024-01-30T01:28:48Z)
Automatic dimensionality reduction of Twin-in-the-Loop Observers [1.6877390079162282]
This paper aims to find a procedure to tune the high-complexity observer by lowering its dimensionality. The strategies have been validated for speed and yaw-rate estimation on real-world data.
arXiv Detail & Related papers (2024-01-18T10:14:21Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Efficient Interpretable Nonlinear Modeling for Multiple Time Series [5.448070998907116]
This paper proposes an efficient nonlinear modeling approach for multiple time series. It incorporates nonlinear interactions among different time-series variables. Experimental results show that the proposed algorithm improves the identification of the support of the VAR coefficients in a parsimonious manner.
arXiv Detail & Related papers (2023-09-29T11:42:59Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting--Full Version [50.43914511877446]
We propose a triangular, variable-specific attention to ensure high efficiency and accuracy. We show that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency.
arXiv Detail & Related papers (2022-04-28T20:41:49Z)
Predicting Attention Sparsity in Transformers [0.9786690381850356]
We propose Sparsefinder, a model trained to identify the sparsity pattern of entmax attention before computing it. Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph.
arXiv Detail & Related papers (2021-09-24T20:51:21Z)
Revisiting Linformer with a modified self-attention with linear complexity [0.0]
I propose an alternative method for self-attention with linear complexity in time and space. Since this method works for long sequences this can be used for images as well as audios.
arXiv Detail & Related papers (2020-12-16T13:23:29Z)
Multi-Objective Matrix Normalization for Fine-grained Visual Recognition [153.49014114484424]
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC) Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features. We propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation.
arXiv Detail & Related papers (2020-03-30T08:40:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.