CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
- URL: http://arxiv.org/abs/2602.02729v1
- Date: Mon, 02 Feb 2026 19:44:24 GMT
- Title: CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
- Authors: Viresh Pati, Yubin Kim, Vinh Pham, Jevon Twitty, Shihao Yang, Jiecheng Lu,
- Abstract summary: CAPS combines SO(2) rotations for phase alignment with three additive gating paths.<n>We introduce the Clock mechanism, a learned temporal weighting that modulates these paths through a shared notion of temporal importance.
- Score: 5.339037322817684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents $\textbf{CAPS}$ (Clock-weighted Aggregation with Prefix-products and Softmax), a structured attention mechanism for time series forecasting that decouples three distinct temporal structures: global trends, local shocks, and seasonal patterns. Standard softmax attention entangles these through global normalization, while recent recurrent models sacrifice long-term, order-independent selection for order-dependent causal structure. CAPS combines SO(2) rotations for phase alignment with three additive gating paths -- Riemann softmax, prefix-product gates, and a Clock baseline -- within a single attention layer. We introduce the Clock mechanism, a learned temporal weighting that modulates these paths through a shared notion of temporal importance. Experiments on long- and short-term forecasting benchmarks surpass vanilla softmax and linear attention mechanisms and demonstrate competitive performance against seven strong baselines with linear complexity. Our code implementation is available at https://github.com/vireshpati/CAPS-Attention.
Related papers
- FuXi-Linear: Unleashing the Power of Linear Attention in Long-term Time-aware Sequential Recommendation [86.55349738440087]
FuXi-Linear is a linear-complexity model designed for efficient long-sequence recommendation.<n>Our approach introduces two key components: (1) a Temporal Retention Channel that independently computes periodic attention weights using temporal data, preventing crosstalk between temporal and semantic signals; and (2) a Linear Positional Channel that integrates positional information through learnable kernels within linear complexity.
arXiv Detail & Related papers (2026-02-27T04:38:28Z) - StretchTime: Adaptive Time Series Forecasting via Symplectic Attention [5.339037322817684]
We show that rotary position embedding is mathematically incapable of representing non-affine temporal warping.<n>We propose Symplectic Positional Embeddings (SyPE), a learnable encoding framework derived from Hamiltonian mechanics.<n>SyPE strictly generalizes RoPE by extending the rotation group $mathrmSO(2)$ to the symplectic group $mathrmSp(2,mathbbR)$, modulated by a novel input-dependent adaptive warp module.
arXiv Detail & Related papers (2026-02-09T18:29:25Z) - Higher-order Linear Attention [59.92962330635185]
quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts.<n>We introduce Higher-order Linear Attention (HLA), a causal, streaming mechanism that realizes higher interactions via compact prefix sufficient statistics.
arXiv Detail & Related papers (2025-10-31T07:54:37Z) - EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting [50.794700596484894]
We propose EntroPE (Entropy-Guided Dynamic Patch), a novel, temporally informed framework that dynamically detects transition points via conditional entropy.<n>This preserves temporal structure while retaining the computational benefits of patching.<n> Experiments across long-term forecasting benchmarks demonstrate that EntroPE improves both accuracy and efficiency.
arXiv Detail & Related papers (2025-09-30T12:09:56Z) - Kairos: Towards Adaptive and Generalizable Time Series Foundation Models [27.076542021368056]
Time series foundation models (TSFMs) have emerged as a powerful paradigm for time series analysis.<n>We propose Kairos, a flexible TSFM framework that integrates a dynamic patching tokenizer and an instance-adaptive positional embedding.<n>Kairos achieves superior performance with much fewer parameters on two common zero-shot benchmarks.
arXiv Detail & Related papers (2025-09-30T06:02:26Z) - Revitalizing Canonical Pre-Alignment for Irregular Multivariate Time Series Forecasting [17.046106977768215]
We propose KAFNet, a compact architecture grounded in Canonical Pre-Alignment (CPA) for IMTS forecasting.<n>KAFNet achieves state-of-the-art forecasting performance, with a 7.2$times$ parameter reduction and a 8.4$times$ training-inference acceleration.
arXiv Detail & Related papers (2025-08-04T01:07:24Z) - AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction [36.239648954658534]
Time series forecasting requires architectures that simultaneously achieve three competing objectives.<n>We introduce AutoHFormer, a hierarchical autoregressive transformer that addresses these challenges.<n> Comprehensive experiments demonstrate that AutoHFormer 10.76X faster training and 6.06X memory reduction compared to PatchTST on P08.
arXiv Detail & Related papers (2025-06-19T03:47:04Z) - MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z) - A Decomposition Modeling Framework for Seasonal Time-Series Forecasting [0.0]
Seasonal time series exhibit intricate long-term dependencies.<n>This paper introduces the Multi-scale Seasonal Decomposition Model (MSSD) for seasonal time-series forecasting.
arXiv Detail & Related papers (2024-12-12T01:37:25Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - Sparse Transformer with Local and Seasonal Adaptation for Multivariate Time Series Forecasting [8.000134983886742]
We propose a Dozer Attention mechanism consisting of three sparse components.
These components are designed to capture essential attributes of MTS data, including locality, seasonality, and global temporal dependencies.
We present the Dozerformer Framework, incorporating the Dozer Attention mechanism for the MTS forecasting task.
arXiv Detail & Related papers (2023-12-11T22:49:02Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.