Grouped self-attention mechanism for a memory-efficient Transformer
- URL: http://arxiv.org/abs/2210.00440v2
- Date: Thu, 6 Oct 2022 09:11:14 GMT
- Title: Grouped self-attention mechanism for a memory-efficient Transformer
- Authors: Bumjun Jung, Yusuke Mukuta, Tatsuya Harada
- Abstract summary: Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time.
Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time.
We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA)
Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
- Score: 64.0125322353281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time-series data analysis is important because numerous real-world tasks such
as forecasting weather, electricity consumption, and stock market involve
predicting data that vary over time. Time-series data are generally recorded
over a long period of observation with long sequences owing to their periodic
characteristics and long-range dependencies over time. Thus, capturing
long-range dependency is an important factor in time-series data forecasting.
To solve these problems, we proposed two novel modules, Grouped Self-Attention
(GSA) and Compressed Cross-Attention (CCA). With both modules, we achieved a
computational space and time complexity of order $O(l)$ with a sequence length
$l$ under small hyperparameter limitations, and can capture locality while
considering global information. The results of experiments conducted on
time-series datasets show that our proposed model efficiently exhibited reduced
computational complexity and performance comparable to or better than existing
methods.
Related papers
- Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting [49.6208017412376]
TimeBridge is a novel framework designed to bridge the gap between non-stationarity and dependency modeling.
TimeBridge consistently achieves state-of-the-art performance in both short-term and long-term forecasting.
arXiv Detail & Related papers (2024-10-06T10:41:03Z) - MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K Parameters [6.733646592789575]
Long-term Time Series Forecasting (LTSF) involves predicting long-term values by analyzing a large amount of historical time-series data to identify patterns and trends.
Transformer-based models offer high forecasting accuracy, but they are often too compute-intensive to be deployed on devices with hardware constraints.
We propose MixLinear, an ultra-lightweight time series forecasting model specifically designed for resource-constrained devices.
arXiv Detail & Related papers (2024-10-02T23:04:57Z) - Test Time Learning for Time Series Forecasting [1.4605709124065924]
Test-Time Training (TTT) modules consistently outperform state-of-the-art models, including the Mamba-based TimeMachine.
Our results show significant improvements in Mean Squared Error (MSE) and Mean Absolute Error (MAE)
This work sets a new benchmark for time-series forecasting and lays the groundwork for future research in scalable, high-performance forecasting models.
arXiv Detail & Related papers (2024-09-21T04:40:08Z) - Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting [65.40983982856056]
We introduce STOIC, that leverages correlations between time-series to learn underlying structure between time-series and to provide well-calibrated and accurate forecasts.
Over a wide-range of benchmark datasets STOIC provides 16% more accurate and better-calibrated forecasts.
arXiv Detail & Related papers (2024-07-02T20:14:32Z) - FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecasting [13.253624747448935]
Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment.
Current deep learning-based predictive models often exhibit a significant deviation between their forecasting outcomes and the ground truth.
We propose a novel model Frequency-domain Attention In Two Horizons, which decomposes time series into trend and seasonal components.
arXiv Detail & Related papers (2024-05-22T02:37:02Z) - GinAR: An End-To-End Multivariate Time Series Forecasting Model Suitable for Variable Missing [21.980379175333443]
We propose a novel Graph Interpolation Attention Recursive Network (named GinAR) to model the spatial-temporal dependencies over the limited collected data for forecasting.
In GinAR, it consists of two key components, that is, attention and adaptive graph convolution.
Experiments conducted on five real-world datasets demonstrate that GinAR outperforms 11 SOTA baselines, and even when 90% of variables are missing, it can still accurately predict the future values of all variables.
arXiv Detail & Related papers (2024-05-18T16:42:44Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.