FreEformer: Frequency Enhanced Transformer for Multivariate Time Series Forecasting
- URL: http://arxiv.org/abs/2501.13989v1
- Date: Thu, 23 Jan 2025 08:53:45 GMT
- Title: FreEformer: Frequency Enhanced Transformer for Multivariate Time Series Forecasting
- Authors: Wenzhen Yue, Yong Liu, Xianghua Ying, Bowei Xing, Ruohao Guo, Ji Shi,
- Abstract summary: This paper presents textbfFreEformer, a simple yet effective model that leverages a textbfFrequency textbfEnhanced Transtextbfformer.
Experiments demonstrate that FreEformer consistently outperforms state-of-the-art models on eighteen real-world benchmarks.
- Score: 17.738942892605234
- License:
- Abstract: This paper presents \textbf{FreEformer}, a simple yet effective model that leverages a \textbf{Fre}quency \textbf{E}nhanced Trans\textbf{former} for multivariate time series forecasting. Our work is based on the assumption that the frequency spectrum provides a global perspective on the composition of series across various frequencies and is highly suitable for robust representation learning. Specifically, we first convert time series into the complex frequency domain using the Discrete Fourier Transform (DFT). The Transformer architecture is then applied to the frequency spectra to capture cross-variate dependencies, with the real and imaginary parts processed independently. However, we observe that the vanilla attention matrix exhibits a low-rank characteristic, thus limiting representation diversity. This could be attributed to the inherent sparsity of the frequency domain and the strong-value-focused nature of Softmax in vanilla attention. To address this, we enhance the vanilla attention mechanism by introducing an additional learnable matrix to the original attention matrix, followed by row-wise L1 normalization. Theoretical analysis~demonstrates that this enhanced attention mechanism improves both feature diversity and gradient flow. Extensive experiments demonstrate that FreEformer consistently outperforms state-of-the-art models on eighteen real-world benchmarks covering electricity, traffic, weather, healthcare and finance. Notably, the enhanced attention mechanism also consistently improves the performance of state-of-the-art Transformer-based forecasters.
Related papers
- Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.
Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.
Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z) - Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification [25.27495694566081]
We propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme.
CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation.
arXiv Detail & Related papers (2024-12-17T14:12:20Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer [18.459822172890473]
We introduce a frequency-aware attention module to unweave skeleton frequency representations.
We also develop a mixed transformer architecture to incorporate spatial features with frequency features.
Experiments show that FreqMiXFormer outperforms SOTA on 3 popular skeleton recognition datasets.
arXiv Detail & Related papers (2024-07-17T05:47:27Z) - Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting [12.989064148254936]
We present a deep frequency derivative learning framework, DERITS, for non-stationary time series forecasting.
Specifically, DERITS is built upon a novel reversible transformation, namely Frequency Derivative Transformation (FDT)
arXiv Detail & Related papers (2024-06-29T17:56:59Z) - A Joint Time-frequency Domain Transformer for Multivariate Time Series
Forecasting [7.501660339993144]
This paper introduces the Joint Time-Frequency Domain Transformer (JTFT)
JTFT combines time and frequency domain representations to make predictions.
Experimental results on six real-world datasets demonstrate that JTFT outperforms state-of-the-art baselines in predictive performance.
arXiv Detail & Related papers (2023-05-24T02:37:23Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Masked Frequency Modeling for Self-Supervised Visual Pre-Training [102.89756957704138]
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.
MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum.
For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token.
arXiv Detail & Related papers (2022-06-15T17:58:30Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.