Related papers: TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting

TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting

URL: http://arxiv.org/abs/2510.06680v1
Date: Wed, 08 Oct 2025 06:07:30 GMT
Title: TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting
Authors: Zhipeng Liu, Peibo Duan, Xuan Tang, Baixin Li, Yongsheng Huang, Mingyang Geng, Changsheng Zhang, Bin Zhang, Binwu Wang,
Abstract summary: We develop a novel Transformer architecture designed for time series data, aiming to maximize its representational capacity.<n>We identify two key but often overlooked characteristics of time series: (1) unidirectional influence from the past to the future, and (2) the phenomenon of decaying influence over time.<n>We propose TimeFormer, whose core innovation is a self-attention mechanism with two modulation terms (MoSA), designed to capture these temporal priors of time series.
Score: 18.890651211582256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although Transformers excel in natural language processing, their extension to time series forecasting remains challenging due to insufficient consideration of the differences between textual and temporal modalities. In this paper, we develop a novel Transformer architecture designed for time series data, aiming to maximize its representational capacity. We identify two key but often overlooked characteristics of time series: (1) unidirectional influence from the past to the future, and (2) the phenomenon of decaying influence over time. These characteristics are introduced to enhance the attention mechanism of Transformers. We propose TimeFormer, whose core innovation is a self-attention mechanism with two modulation terms (MoSA), designed to capture these temporal priors of time series under the constraints of the Hawkes process and causal masking. Additionally, TimeFormer introduces a framework based on multi-scale and subsequence analysis to capture semantic dependencies at different temporal scales, enriching the temporal dependencies. Extensive experiments conducted on multiple real-world datasets show that TimeFormer significantly outperforms state-of-the-art methods, achieving up to a 7.45% reduction in MSE compared to the best baseline and setting new benchmarks on 94.04\% of evaluation metrics. Moreover, we demonstrate that the MoSA mechanism can be broadly applied to enhance the performance of other Transformer-based models.

Related papers

MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models [51.506429027626005]
Memory for Time Series (MEMTS) is a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting.<n>Key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics.<n>This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.
arXiv Detail & Related papers (2026-02-14T14:00:06Z)
DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
Patch-Level Tokenization with CNN Encoders and Attention for Improved Transformer Time-Series Forecasting [0.0]
This paper proposes a two-stage forecasting framework that separates local temporal representation learning from global dependency modelling.<n>A convolutional neural network operates on fixed-length temporal patches to extract short-range temporal dynamics and non-linear feature interactions.<n> Token-level self-attention is applied during representation learning to refine these embeddings, after which a Transformer encoder models inter-patch temporal dependencies to generate forecasts.
arXiv Detail & Related papers (2026-01-18T16:16:01Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.<n>Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.<n>Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting [10.32586981170693]
Inverted Seasonal-Trend Decomposition Transformer (Ister)<n>We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy.<n>Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.
arXiv Detail & Related papers (2024-12-25T06:37:19Z)
LATST: Are Transformers Necessarily Complex for Time-Series Forecasting [0.0]
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.<n>Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain.<n>We introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting.
arXiv Detail & Related papers (2024-10-31T09:09:39Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
ViTs for SITS: Vision Transformers for Satellite Image Time Series [52.012084080257544]
We introduce a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT) TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder.
arXiv Detail & Related papers (2023-01-12T11:33:07Z)
DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting [7.805077630467324]
Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting. It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting. However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks.
arXiv Detail & Related papers (2022-06-11T10:34:29Z)
ETSformer: Exponential Smoothing Transformers for Time-series Forecasting [35.76867542099019]
We propose ETSFormer, a novel time-series Transformer architecture, which exploits the principle of exponential smoothing in improving Transformers for time-series forecasting. In particular, inspired by the classical exponential smoothing methods in time-series forecasting, we propose the novel exponential smoothing attention (ESA) and frequency attention (FA) to replace the self-attention mechanism in vanilla Transformers, thus improving both accuracy and efficiency.
arXiv Detail & Related papers (2022-02-03T02:50:44Z)
Temporal Attention Augmented Transformer Hawkes Process [4.624987488467739]
We come up with a new kind of Transformer-based Hawkes process model, Temporal Attention Augmented Transformer Hawkes Process (TAA-THP) We modify the traditional dot-product attention structure, and introduce the temporal encoding into attention structure. We conduct numerous experiments on a wide range of synthetic and real-life datasets to validate the performance of our proposed TAA-THP model.
arXiv Detail & Related papers (2021-12-29T09:45:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.