Related papers: Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting

Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting

URL: http://arxiv.org/abs/2208.14236v1
Date: Tue, 30 Aug 2022 13:04:48 GMT
Title: Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting
Authors: Espen Haugsdal, Erlend Aune, Massimiliano Ruocco
Abstract summary: Time series forecasting is an important problem, with many real world applications. We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting. We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model.
Score: 0.7734726150561088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Time series forecasting is an important problem, with many real world applications. Ensembles of deep neural networks have recently achieved impressive forecasting accuracy, but such large ensembles are impractical in many real world settings. Transformer models been successfully applied to a diverse set of challenging problems. We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting, called Persistence Initialization. The model is initialized as a naive persistence model by using a multiplicative gating mechanism combined with a residual skip connection. We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model. We evaluate our proposed architecture on the challenging M4 dataset, achieving competitive performance compared to ensemble based methods. We also compare against existing recently proposed Transformer models for time series forecasting, showing superior performance on the M4 dataset. Extensive ablation studies show that Persistence Initialization leads to better performance and faster convergence. As the size of the model increases, only the models with our proposed adaptation gain in performance. We also perform an additional ablation study to determine the importance of the choice of normalization and positional encoding, and find both the use of Rotary encodings and ReZero normalization to be essential for good forecasting performance.

Related papers

ss-Mamba: Semantic-Spline Selective State-Space Model [0.0]
ss-Mamba is a novel foundation model that enhances time series forecasting by integrating semantic-aware embeddings and adaptive spline-based temporal encoding.<n>We show that ss-Mamba delivers superior accuracy, robustness, and interpretability, demonstrating its capability as a versatile and computationally efficient alternative to traditional Transformer-based models in time-series forecasting.
arXiv Detail & Related papers (2025-06-03T03:26:57Z)
Learning Transformer-based World Models with Contrastive Predictive Coding [58.0159270859475]
We show that the next state prediction objective is insufficient to fully exploit the representation capabilities of Transformers. We propose to extend world model predictions to longer time horizons by introducing TWISTER, a world model using action-conditioned Contrastive Predictive Coding. TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search.
arXiv Detail & Related papers (2025-03-06T13:18:37Z)
Neural Conformal Control for Time Series Forecasting [54.96087475179419]
We introduce a neural network conformal prediction method for time series that enhances adaptivity in non-stationary environments. Our approach acts as a neural controller designed to achieve desired target coverage, leveraging auxiliary multi-view data with neural network encoders. We empirically demonstrate significant improvements in coverage and probabilistic accuracy, and find that our method is the only one that combines good calibration with consistency in prediction intervals.
arXiv Detail & Related papers (2024-12-24T03:56:25Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches. We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS) Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
arXiv Detail & Related papers (2024-05-27T06:49:39Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS) We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z)
Hybrid State Space-based Learning for Sequential Data Prediction with Joint Optimization [0.0]
We introduce a hybrid model that mitigates, via a joint mechanism, the need for domain-specific feature engineering issues of conventional nonlinear prediction models. We achieve this by introducing novel state space representations for the base models, which are then combined to provide a full state space representation of the hybrid or the ensemble. Due to such novel combination and joint optimization, we demonstrate significant improvements in widely publicized real life competition datasets.
arXiv Detail & Related papers (2023-09-19T12:00:28Z)
A Transformer-based Framework For Multi-variate Time Series: A Remaining Useful Life Prediction Use Case [4.0466311968093365]
This work proposed an encoder-transformer architecture-based framework for time series prediction. We validated the effectiveness of the proposed framework on all four sets of the C-MAPPS benchmark dataset. To enable the model awareness of the initial stages of the machine life and its degradation path, a novel expanding window method was proposed.
arXiv Detail & Related papers (2023-08-19T02:30:35Z)
Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks. We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting. We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.