Related papers: PeriodNet: Boosting the Potential of Attention Mechanism for Time Series Forecasting

PeriodNet: Boosting the Potential of Attention Mechanism for Time Series Forecasting

URL: http://arxiv.org/abs/2511.19497v1
Date: Sun, 23 Nov 2025 14:47:38 GMT
Title: PeriodNet: Boosting the Potential of Attention Mechanism for Time Series Forecasting
Authors: Bowen Zhao, Huanlai Xing, Zhiwen Xiao, Jincheng Peng, Li Feng, Xinhan Wang, Rong Qu, Hui Li,
Abstract summary: We present PeriodNet, which incorporates period attention and sparse period attention mechanism for analyzing adjacent periods.<n> PeriodNet achieves a relative improvement of 22% when forecasting time series with a length of 720, in comparison to other models based on the conventional encoder-decoder Transformer architecture.
Score: 15.752636750230053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The attention mechanism has demonstrated remarkable potential in sequence modeling, exemplified by its successful application in natural language processing with models such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT). Despite these advancements, its utilization in time series forecasting (TSF) has yet to meet expectations. Exploring a better network structure for attention in TSF holds immense significance across various domains. In this paper, we present PeriodNet with a brand new structure to forecast univariate and multivariate time series. PeriodNet incorporates period attention and sparse period attention mechanism for analyzing adjacent periods. It enhances the mining of local characteristics, periodic patterns, and global dependencies. For efficient cross-variable modeling, we introduce an iterative grouping mechanism which can directly reduce the cross-variable redundancy. To fully leverage the extracted features on the encoder side, we redesign the entire architecture of the vanilla Transformer and propose a period diffuser for precise multi-period prediction. Through comprehensive experiments conducted on eight datasets, we demonstrate that PeriodNet outperforms six state-of-the-art models in both univariate and multivariate TSF scenarios in terms of mean square error and mean absolute error. In particular, PeriodNet achieves a relative improvement of 22% when forecasting time series with a length of 720, in comparison to other models based on the conventional encoder-decoder Transformer architecture.

Related papers

DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
Patch-Level Tokenization with CNN Encoders and Attention for Improved Transformer Time-Series Forecasting [0.0]
This paper proposes a two-stage forecasting framework that separates local temporal representation learning from global dependency modelling.<n>A convolutional neural network operates on fixed-length temporal patches to extract short-range temporal dynamics and non-linear feature interactions.<n> Token-level self-attention is applied during representation learning to refine these embeddings, after which a Transformer encoder models inter-patch temporal dependencies to generate forecasts.
arXiv Detail & Related papers (2026-01-18T16:16:01Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
EDformer: Embedded Decomposition Transformer for Interpretable Multivariate Time Series Predictions [4.075971633195745]
This paper introduces an embedded transformer, 'EDformer', for time series forecasting tasks.<n>Without altering the fundamental elements, we reuse the Transformer architecture and consider the capable functions of its constituent parts.<n>The model obtains state-of-the-art predicting results in terms of accuracy and efficiency on complex real-world time series datasets.
arXiv Detail & Related papers (2024-12-16T11:13:57Z)
LATST: Are Transformers Necessarily Complex for Time-Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.<n>Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain.<n>We introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting.
arXiv Detail & Related papers (2024-10-31T09:09:39Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting [8.841114905151152]
Local Attention Mechanism (LAM) is an efficient attention mechanism tailored for time series analysis.<n>LAM exploits the continuity properties of time series to reduce the number of attention scores computed.<n>We present an algorithm for implementing LAM in algebra tensor that runs in time and memory O(nlogn)
arXiv Detail & Related papers (2024-10-04T11:32:02Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z)
Stecformer: Spatio-temporal Encoding Cascaded Transformer for Multivariate Long-term Time Series Forecasting [11.021398675773055]
We propose a complete solution to address problems in terms of feature extraction and target prediction. For extraction, we design an efficient-temporal encoding extractor including a semi-adaptive graph to acquire sufficient-temporal information. For prediction, we propose a Cascaded De Predictor (CDP) to strengthen the correlation between different intervals.
arXiv Detail & Related papers (2023-05-25T13:00:46Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.