PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting
- URL: http://arxiv.org/abs/2508.13773v2
- Date: Fri, 22 Aug 2025 15:38:35 GMT
- Title: PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting
- Authors: Tian Sun, Yuqi Chen, Weiwei Sun,
- Abstract summary: We propose a simple yet effective mechanism, Periodic-Nested Group Attention, namely PENGUIN.<n>Our approach highlights the importance of explicitly modeling periodic patterns and incorporating relative attention bias for effective time series modeling.<n>Experiments across diverse benchmarks demonstrate that PENGUIN consistently outperforms both Transformer-based and Transformer-based models.
- Score: 3.161024408916268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-term time series forecasting (LTSF) is a fundamental task with wide-ranging applications. Although Transformer-based models have made significant breakthroughs in forecasting, their effectiveness for time series forecasting remains debatable. In this paper, we revisit the significance of self-attention and propose a simple yet effective mechanism, Periodic-Nested Group Attention, namely PENGUIN. Our approach highlights the importance of explicitly modeling periodic patterns and incorporating relative attention bias for effective time series modeling. To this end, we introduce a periodic-nested relative attention bias that captures periodic structures directly. To handle multiple coexisting periodicities (e.g., daily and weekly cycles), we design a grouped attention mechanism, where each group targets a specific periodicity using a multi-query attention mechanism. Extensive experiments across diverse benchmarks demonstrate that PENGUIN consistently outperforms both MLP-based and Transformer-based models.
Related papers
- PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting [30.347634829157766]
We propose PHAT (Period Heterogeneity-Aware Transformer) for modeling periodicity in real-world data.<n>By restricting interactions within buckets and masking cross-bucket connections, PHAT effectively avoids interference from inconsistent periods.<n>We evaluate PHAT on 14 real-world datasets against 18 baselines, and the results show that it significantly outperforms existing methods.
arXiv Detail & Related papers (2026-01-31T10:58:09Z) - PeriodNet: Boosting the Potential of Attention Mechanism for Time Series Forecasting [15.752636750230053]
We present PeriodNet, which incorporates period attention and sparse period attention mechanism for analyzing adjacent periods.<n> PeriodNet achieves a relative improvement of 22% when forecasting time series with a length of 720, in comparison to other models based on the conventional encoder-decoder Transformer architecture.
arXiv Detail & Related papers (2025-11-23T14:47:38Z) - Chronos-2: From Univariate to Universal Forecasting [52.753731922908905]
Chronos-2 is a pretrained model capable of handling univariate, multivariate, and covariate-informed forecasting tasks in a zero-shot manner.<n>It delivers state-of-the-art performance across three comprehensive benchmarks: fev-bench, GIFT-Eval, and Chronos Benchmark II.<n>The in-context learning capabilities of Chronos-2 establish it as a general-purpose forecasting model that can be used "as is" in real-world forecasting pipelines.
arXiv Detail & Related papers (2025-10-17T17:00:53Z) - A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z) - VARMA-Enhanced Transformer for Time Series Forecasting [4.982130518684668]
VARMAformer is a novel architecture that synergizes the efficiency of a cross-attention-only framework with the principles of classical time series analysis.<n>By fusing these classical insights into a modern backbone, VARMAformer captures both global, long-range dependencies and local, statistical structures.
arXiv Detail & Related papers (2025-09-05T03:32:51Z) - MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z) - TimesBERT: A BERT-Style Foundation Model for Time Series Understanding [72.64824086839631]
GPT-style models have been positioned as foundation models for time series forecasting.<n>BERT-style architecture has not been fully unlocked for time series understanding.<n>We design TimesBERT to learn generic representations of time series.<n>Our model is pre-trained on 260 billion time points across diverse domains.
arXiv Detail & Related papers (2025-02-28T17:14:44Z) - FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities [17.164913785452367]
We propose FlexTSF, a universal time series forecasting model that possesses better generalization and supports both regular and irregular time series.
Experiments on 12 datasets show that FlexTSF outperforms state-of-the-art forecasting models respectively designed for regular and irregular time series.
arXiv Detail & Related papers (2024-10-30T16:14:09Z) - Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting [36.577411683455786]
Recent linear and transformer-based forecasters have shown superior performance in time series forecasting.
They are constrained by their inherent inability to effectively address long-range dependencies in time series data.
We introduce a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples.
arXiv Detail & Related papers (2024-10-28T06:17:20Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecasting [13.253624747448935]
Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment.
Current deep learning-based predictive models often exhibit a significant deviation between their forecasting outcomes and the ground truth.
We propose a novel model Frequency-domain Attention In Two Horizons, which decomposes time series into trend and seasonal components.
arXiv Detail & Related papers (2024-05-22T02:37:02Z) - Towards Expressive Spectral-Temporal Graph Neural Networks for Time Series Forecasting [101.5022396668152]
Spectral-temporal graph neural network is a promising abstraction underlying most time series forecasting models.<n>We establish a theoretical framework that unravels the expressive power of spectral-temporal GNNs.<n>Our findings pave the way for devising a broader array of provably expressive GNN-based models for time series.
arXiv Detail & Related papers (2023-05-11T05:56:38Z) - Model-Attentive Ensemble Learning for Sequence Modeling [86.4785354333566]
We present Model-Attentive Ensemble learning for Sequence modeling (MAES)
MAES is a mixture of time-series experts which leverages an attention-based gating mechanism to specialize the experts on different sequence dynamics and adaptively weight their predictions.
We demonstrate that MAES significantly out-performs popular sequence models on datasets subject to temporal shift.
arXiv Detail & Related papers (2021-02-23T05:23:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.