Grouped self-attention mechanism for a memory-efficient Transformer
- URL: http://arxiv.org/abs/2210.00440v2
- Date: Thu, 6 Oct 2022 09:11:14 GMT
- Title: Grouped self-attention mechanism for a memory-efficient Transformer
- Authors: Bumjun Jung, Yusuke Mukuta, Tatsuya Harada
- Abstract summary: Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time.
Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time.
We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA)
Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
- Score: 64.0125322353281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time-series data analysis is important because numerous real-world tasks such
as forecasting weather, electricity consumption, and stock market involve
predicting data that vary over time. Time-series data are generally recorded
over a long period of observation with long sequences owing to their periodic
characteristics and long-range dependencies over time. Thus, capturing
long-range dependency is an important factor in time-series data forecasting.
To solve these problems, we proposed two novel modules, Grouped Self-Attention
(GSA) and Compressed Cross-Attention (CCA). With both modules, we achieved a
computational space and time complexity of order $O(l)$ with a sequence length
$l$ under small hyperparameter limitations, and can capture locality while
considering global information. The results of experiments conducted on
time-series datasets show that our proposed model efficiently exhibited reduced
computational complexity and performance comparable to or better than existing
methods.
Related papers
- Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting [65.40983982856056]
We introduce STOIC, that leverages correlations between time-series to learn underlying structure between time-series and to provide well-calibrated and accurate forecasts.
Over a wide-range of benchmark datasets STOIC provides 16% more accurate and better-calibrated forecasts.
arXiv Detail & Related papers (2024-07-02T20:14:32Z) - SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series Forecasting [19.111041921060366]
We present a scalable Adaptive Graph Diffusion Forecasting Network (SAGDFN) to capture complex spatial-temporal correlation.
SAGDFN is scalable to datasets of thousands of nodes without the need of prior knowledge of spatial correlation.
It achieves comparable performance with state-of-the-art baselines on one real-world dataset of 207 nodes and outperforms all state-of-the-art baselines by a significant margin on three real-world datasets of 2000 nodes.
arXiv Detail & Related papers (2024-06-18T05:19:51Z) - FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecasting [13.253624747448935]
Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment.
Current deep learning-based predictive models often exhibit a significant deviation between their forecasting outcomes and the ground truth.
We propose a novel model Frequency-domain Attention In Two Horizons, which decomposes time series into trend and seasonal components.
arXiv Detail & Related papers (2024-05-22T02:37:02Z) - GinAR: An End-To-End Multivariate Time Series Forecasting Model Suitable for Variable Missing [21.980379175333443]
We propose a novel Graph Interpolation Attention Recursive Network (named GinAR) to model the spatial-temporal dependencies over the limited collected data for forecasting.
In GinAR, it consists of two key components, that is, attention and adaptive graph convolution.
Experiments conducted on five real-world datasets demonstrate that GinAR outperforms 11 SOTA baselines, and even when 90% of variables are missing, it can still accurately predict the future values of all variables.
arXiv Detail & Related papers (2024-05-18T16:42:44Z) - Robust Detection of Lead-Lag Relationships in Lagged Multi-Factor Models [61.10851158749843]
Key insights can be obtained by discovering lead-lag relationships inherent in the data.
We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models.
arXiv Detail & Related papers (2023-05-11T10:30:35Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - PIETS: Parallelised Irregularity Encoders for Forecasting with
Heterogeneous Time-Series [5.911865723926626]
Heterogeneity and irregularity of multi-source data sets present a significant challenge to time-series analysis.
In this work, we design a novel architecture, PIETS, to model heterogeneous time-series.
We show that PIETS is able to effectively model heterogeneous temporal data and outperforms other state-of-the-art approaches in the prediction task.
arXiv Detail & Related papers (2021-09-30T20:01:19Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.