Persistence Initialization: A novel adaptation of the Transformer
architecture for Time Series Forecasting
- URL: http://arxiv.org/abs/2208.14236v1
- Date: Tue, 30 Aug 2022 13:04:48 GMT
- Title: Persistence Initialization: A novel adaptation of the Transformer
architecture for Time Series Forecasting
- Authors: Espen Haugsdal, Erlend Aune, Massimiliano Ruocco
- Abstract summary: Time series forecasting is an important problem, with many real world applications.
We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting.
We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model.
- Score: 0.7734726150561088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time series forecasting is an important problem, with many real world
applications. Ensembles of deep neural networks have recently achieved
impressive forecasting accuracy, but such large ensembles are impractical in
many real world settings. Transformer models been successfully applied to a
diverse set of challenging problems. We propose a novel adaptation of the
original Transformer architecture focusing on the task of time series
forecasting, called Persistence Initialization. The model is initialized as a
naive persistence model by using a multiplicative gating mechanism combined
with a residual skip connection. We use a decoder Transformer with ReZero
normalization and Rotary positional encodings, but the adaptation is applicable
to any auto-regressive neural network model. We evaluate our proposed
architecture on the challenging M4 dataset, achieving competitive performance
compared to ensemble based methods. We also compare against existing recently
proposed Transformer models for time series forecasting, showing superior
performance on the M4 dataset. Extensive ablation studies show that Persistence
Initialization leads to better performance and faster convergence. As the size
of the model increases, only the models with our proposed adaptation gain in
performance. We also perform an additional ablation study to determine the
importance of the choice of normalization and positional encoding, and find
both the use of Rotary encodings and ReZero normalization to be essential for
good forecasting performance.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios.
Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches.
We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS)
Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
arXiv Detail & Related papers (2024-05-27T06:49:39Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - Hybrid State Space-based Learning for Sequential Data Prediction with
Joint Optimization [0.0]
We introduce a hybrid model that mitigates, via a joint mechanism, the need for domain-specific feature engineering issues of conventional nonlinear prediction models.
We achieve this by introducing novel state space representations for the base models, which are then combined to provide a full state space representation of the hybrid or the ensemble.
Due to such novel combination and joint optimization, we demonstrate significant improvements in widely publicized real life competition datasets.
arXiv Detail & Related papers (2023-09-19T12:00:28Z) - A Transformer-based Framework For Multi-variate Time Series: A Remaining
Useful Life Prediction Use Case [4.0466311968093365]
This work proposed an encoder-transformer architecture-based framework for time series prediction.
We validated the effectiveness of the proposed framework on all four sets of the C-MAPPS benchmark dataset.
To enable the model awareness of the initial stages of the machine life and its degradation path, a novel expanding window method was proposed.
arXiv Detail & Related papers (2023-08-19T02:30:35Z) - Two Steps Forward and One Behind: Rethinking Time Series Forecasting
with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks.
We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting.
We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.