Related papers: PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting

PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting

URL: http://arxiv.org/abs/2508.13773v2
Date: Fri, 22 Aug 2025 15:38:35 GMT
Title: PENGUIN: Enhancing Transformer with Periodic-Nested Group Attention for Long-term Time Series Forecasting
Authors: Tian Sun, Yuqi Chen, Weiwei Sun,
Abstract summary: We propose a simple yet effective mechanism, Periodic-Nested Group Attention, namely PENGUIN.<n>Our approach highlights the importance of explicitly modeling periodic patterns and incorporating relative attention bias for effective time series modeling.<n>Experiments across diverse benchmarks demonstrate that PENGUIN consistently outperforms both Transformer-based and Transformer-based models.
Score: 3.161024408916268
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-term time series forecasting (LTSF) is a fundamental task with wide-ranging applications. Although Transformer-based models have made significant breakthroughs in forecasting, their effectiveness for time series forecasting remains debatable. In this paper, we revisit the significance of self-attention and propose a simple yet effective mechanism, Periodic-Nested Group Attention, namely PENGUIN. Our approach highlights the importance of explicitly modeling periodic patterns and incorporating relative attention bias for effective time series modeling. To this end, we introduce a periodic-nested relative attention bias that captures periodic structures directly. To handle multiple coexisting periodicities (e.g., daily and weekly cycles), we design a grouped attention mechanism, where each group targets a specific periodicity using a multi-query attention mechanism. Extensive experiments across diverse benchmarks demonstrate that PENGUIN consistently outperforms both MLP-based and Transformer-based models.

Related papers

PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting [30.347634829157766]
We propose PHAT (Period Heterogeneity-Aware Transformer) for modeling periodicity in real-world data.<n>By restricting interactions within buckets and masking cross-bucket connections, PHAT effectively avoids interference from inconsistent periods.<n>We evaluate PHAT on 14 real-world datasets against 18 baselines, and the results show that it significantly outperforms existing methods.
arXiv Detail & Related papers (2026-01-31T10:58:09Z)
PeriodNet: Boosting the Potential of Attention Mechanism for Time Series Forecasting [15.752636750230053]
We present PeriodNet, which incorporates period attention and sparse period attention mechanism for analyzing adjacent periods.<n> PeriodNet achieves a relative improvement of 22% when forecasting time series with a length of 720, in comparison to other models based on the conventional encoder-decoder Transformer architecture.
arXiv Detail & Related papers (2025-11-23T14:47:38Z)
Chronos-2: From Univariate to Universal Forecasting [52.753731922908905]
Chronos-2 is a pretrained model capable of handling univariate, multivariate, and covariate-informed forecasting tasks in a zero-shot manner.<n>It delivers state-of-the-art performance across three comprehensive benchmarks: fev-bench, GIFT-Eval, and Chronos Benchmark II.<n>The in-context learning capabilities of Chronos-2 establish it as a general-purpose forecasting model that can be used "as is" in real-world forecasting pipelines.
arXiv Detail & Related papers (2025-10-17T17:00:53Z)
A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z)
VARMA-Enhanced Transformer for Time Series Forecasting [4.982130518684668]
VARMAformer is a novel architecture that synergizes the efficiency of a cross-attention-only framework with the principles of classical time series analysis.<n>By fusing these classical insights into a modern backbone, VARMAformer captures both global, long-range dependencies and local, statistical structures.
arXiv Detail & Related papers (2025-09-05T03:32:51Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
TimesBERT: A BERT-Style Foundation Model for Time Series Understanding [72.64824086839631]
GPT-style models have been positioned as foundation models for time series forecasting.<n>BERT-style architecture has not been fully unlocked for time series understanding.<n>We design TimesBERT to learn generic representations of time series.<n>Our model is pre-trained on 260 billion time points across diverse domains.
arXiv Detail & Related papers (2025-02-28T17:14:44Z)
FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities [17.164913785452367]
We propose FlexTSF, a universal time series forecasting model that possesses better generalization and supports both regular and irregular time series. Experiments on 12 datasets show that FlexTSF outperforms state-of-the-art forecasting models respectively designed for regular and irregular time series.
arXiv Detail & Related papers (2024-10-30T16:14:09Z)
Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting [36.577411683455786]
Recent linear and transformer-based forecasters have shown superior performance in time series forecasting. They are constrained by their inherent inability to effectively address long-range dependencies in time series data. We introduce a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples.
arXiv Detail & Related papers (2024-10-28T06:17:20Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecasting [13.253624747448935]
Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment. Current deep learning-based predictive models often exhibit a significant deviation between their forecasting outcomes and the ground truth. We propose a novel model Frequency-domain Attention In Two Horizons, which decomposes time series into trend and seasonal components.
arXiv Detail & Related papers (2024-05-22T02:37:02Z)
Towards Expressive Spectral-Temporal Graph Neural Networks for Time Series Forecasting [101.5022396668152]
Spectral-temporal graph neural network is a promising abstraction underlying most time series forecasting models.<n>We establish a theoretical framework that unravels the expressive power of spectral-temporal GNNs.<n>Our findings pave the way for devising a broader array of provably expressive GNN-based models for time series.
arXiv Detail & Related papers (2023-05-11T05:56:38Z)
Model-Attentive Ensemble Learning for Sequence Modeling [86.4785354333566]
We present Model-Attentive Ensemble learning for Sequence modeling (MAES) MAES is a mixture of time-series experts which leverages an attention-based gating mechanism to specialize the experts on different sequence dynamics and adaptively weight their predictions. We demonstrate that MAES significantly out-performs popular sequence models on datasets subject to temporal shift.
arXiv Detail & Related papers (2021-02-23T05:23:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.