Attention as Robust Representation for Time Series Forecasting
- URL: http://arxiv.org/abs/2402.05370v1
- Date: Thu, 8 Feb 2024 03:00:50 GMT
- Title: Attention as Robust Representation for Time Series Forecasting
- Authors: PeiSong Niu, Tian Zhou, Xue Wang, Liang Sun, Rong Jin
- Abstract summary: Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
- Score: 23.292260325891032
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Time series forecasting is essential for many practical applications, with
the adoption of transformer-based models on the rise due to their impressive
performance in NLP and CV. Transformers' key feature, the attention mechanism,
dynamically fusing embeddings to enhance data representation, often relegating
attention weights to a byproduct role. Yet, time series data, characterized by
noise and non-stationarity, poses significant forecasting challenges. Our
approach elevates attention weights as the primary representation for time
series, capitalizing on the temporal relationships among data points to improve
forecasting accuracy. Our study shows that an attention map, structured using
global landmarks and local windows, acts as a robust kernel representation for
data points, withstanding noise and shifts in distribution. Our method
outperforms state-of-the-art models, reducing mean squared error (MSE) in
multivariate time series forecasting by a notable 3.6% without altering the
core neural network architecture. It serves as a versatile component that can
readily replace recent patching based embedding schemes in transformer-based
models, boosting their performance.
Related papers
- Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.
Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.
Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z) - EDformer: Embedded Decomposition Transformer for Interpretable Multivariate Time Series Predictions [4.075971633195745]
This paper introduces an embedded transformer, 'EDformer', for time series forecasting tasks.
Without altering the fundamental elements, we reuse the Transformer architecture and consider the capable functions of its constituent parts.
The model obtains state-of-the-art predicting results in terms of accuracy and efficiency on complex real-world time series datasets.
arXiv Detail & Related papers (2024-12-16T11:13:57Z) - PSformer: Parameter-efficient Transformer with Segment Attention for Time Series Forecasting [21.033660755921737]
Time forecasting remains a critical challenge across various domains, often complicated by high-dimensional data and long-term dependencies.
This paper presents a novel transformer architecture for time series forecasting, incorporating two key innovations: parameter sharing (PS) and Spatial-Temporal Attention (SegAtt)
arXiv Detail & Related papers (2024-11-03T03:04:00Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series
Forecasting Approach [71.67506068703314]
Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services.
Traditionally, urban mobility data has been structured as videos, treating longitude and latitude as fundamental pixels.
In our research, we introduce a fresh perspective on urban mobility prediction.
Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex time series.
arXiv Detail & Related papers (2023-12-04T07:39:05Z) - Ti-MAE: Self-Supervised Masked Time Series Autoencoders [16.98069693152999]
We propose a novel framework named Ti-MAE, in which the input time series are assumed to follow an integrate distribution.
Ti-MAE randomly masks out embedded time series data and learns an autoencoder to reconstruct them at the point-level.
Experiments on several public real-world datasets demonstrate that our framework of masked autoencoding could learn strong representations directly from the raw data.
arXiv Detail & Related papers (2023-01-21T03:20:23Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - DRAformer: Differentially Reconstructed Attention Transformer for
Time-Series Forecasting [7.805077630467324]
Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting.
It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting.
However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks.
arXiv Detail & Related papers (2022-06-11T10:34:29Z) - Deep Autoregressive Models with Spectral Attention [74.08846528440024]
We propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module.
By characterizing in the spectral domain the embedding of the time series as occurrences of a random process, our method can identify global trends and seasonality patterns.
Two spectral attention models, global and local to the time series, integrate this information within the forecast and perform spectral filtering to remove time series's noise.
arXiv Detail & Related papers (2021-07-13T11:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.