Related papers: WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting

WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting

URL: http://arxiv.org/abs/2509.25231v1
Date: Thu, 25 Sep 2025 02:43:51 GMT
Title: WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting
Authors: Xiaojian Wang, Chaoli Zhang, Zhonglong Zheng, Yunliang Jiang,
Abstract summary: Time series forecasting has various applications, such as meteorological rainfall prediction, traffic flow analysis, financial forecasting, and operational load monitoring.<n>Due to the sparsity of time series data, relying solely on time-domain or frequency-domain modeling limits the model's ability to fully leverage multi-domain information.<n>We proposed WDformer, a wavelet-based differential Transformer model, to conduct a multi-resolution analysis of time series data.
Score: 21.222605948133893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Time series forecasting has various applications, such as meteorological rainfall prediction, traffic flow analysis, financial forecasting, and operational load monitoring for various systems. Due to the sparsity of time series data, relying solely on time-domain or frequency-domain modeling limits the model's ability to fully leverage multi-domain information. Moreover, when applied to time series forecasting tasks, traditional attention mechanisms tend to over-focus on irrelevant historical information, which may introduce noise into the prediction process, leading to biased results. We proposed WDformer, a wavelet-based differential Transformer model. This study employs the wavelet transform to conduct a multi-resolution analysis of time series data. By leveraging the advantages of joint representation in the time-frequency domain, it accurately extracts the key information components that reflect the essential characteristics of the data. Furthermore, we apply attention mechanisms on inverted dimensions, allowing the attention mechanism to capture relationships between multiple variables. When performing attention calculations, we introduced the differential attention mechanism, which computes the attention score by taking the difference between two separate softmax attention matrices. This approach enables the model to focus more on important information and reduce noise. WDformer has achieved state-of-the-art (SOTA) results on multiple challenging real-world datasets, demonstrating its accuracy and effectiveness. Code is available at https://github.com/xiaowangbc/WDformer.

Related papers

FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z)
A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
EDformer: Embedded Decomposition Transformer for Interpretable Multivariate Time Series Predictions [4.075971633195745]
This paper introduces an embedded transformer, 'EDformer', for time series forecasting tasks.<n>Without altering the fundamental elements, we reuse the Transformer architecture and consider the capable functions of its constituent parts.<n>The model obtains state-of-the-art predicting results in terms of accuracy and efficiency on complex real-world time series datasets.
arXiv Detail & Related papers (2024-12-16T11:13:57Z)
TimeSieve: Extracting Temporal Dynamics through Information Bottlenecks [31.10683149519954]
We propose an innovative time series forecasting model TimeSieve. Our approach employs wavelet transforms to preprocess time series data, effectively capturing multi-scale features. Our results validate the effectiveness of our approach in addressing the key challenges in time series forecasting.
arXiv Detail & Related papers (2024-06-07T15:58:12Z)
Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Graph Spatiotemporal Process for Multivariate Time Series Anomaly Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies. Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z)
Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series Forecasting Approach [71.67506068703314]
Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services. Traditionally, urban mobility data has been structured as videos, treating longitude and latitude as fundamental pixels. In our research, we introduce a fresh perspective on urban mobility prediction. Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex time series.
arXiv Detail & Related papers (2023-12-04T07:39:05Z)
DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting [7.805077630467324]
Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting. It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting. However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks.
arXiv Detail & Related papers (2022-06-11T10:34:29Z)
Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF) Our model avoids the influence of cumulative error and does not increase the time complexity. Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
Spatiotemporal Attention for Multivariate Time Series Prediction and Interpretation [17.568599402858037]
temporal attention mechanism (STAM) for simultaneous learning of the most important time steps and variables. Results: STAM maintains state-of-the-art prediction accuracy while offering the benefit of accurate interpretability.
arXiv Detail & Related papers (2020-08-11T17:34:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.