LSEAttention is All You Need for Time Series Forecasting
- URL: http://arxiv.org/abs/2410.23749v3
- Date: Thu, 30 Jan 2025 13:50:52 GMT
- Title: LSEAttention is All You Need for Time Series Forecasting
- Authors: Dizhen Liang,
- Abstract summary: Transformer-based architectures have achieved remarkable success in natural language processing and computer vision.
Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain.
We introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting.
- Score: 0.0
- License:
- Abstract: Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness.
Related papers
- Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.
Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.
Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z) - Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios.
Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches.
We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS)
Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
arXiv Detail & Related papers (2024-05-27T06:49:39Z) - Unlocking the Power of Patch: Patch-Based MLP for Long-Term Time Series Forecasting [0.0]
Recent studies have attempted to refine the Transformer architecture to demonstrate its effectiveness in Long-Term Time Series Forecasting tasks.
We attribute the effectiveness of these models largely to the adopted Patch mechanism.
We propose a novel and simple Patch-based components (PatchMLP) for LTSF tasks.
arXiv Detail & Related papers (2024-05-22T12:12:20Z) - UniTime: A Language-Empowered Unified Model for Cross-Domain Time Series
Forecasting [59.11817101030137]
This research advocates for a unified model paradigm that transcends domain boundaries.
Learning an effective cross-domain model presents the following challenges.
We propose UniTime for effective cross-domain time series learning.
arXiv Detail & Related papers (2023-10-15T06:30:22Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - Two Steps Forward and One Behind: Rethinking Time Series Forecasting
with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks.
We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting.
We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Spatial-Temporal Identity: A Simple yet Effective Baseline for
Multivariate Time Series Forecasting [17.84296081495185]
We explore the critical factors of MTS forecasting and design a model that is as powerful as STGNNs, but more concise and efficient.
We identify the indistinguishability of samples in both spatial and temporal dimensions as a key bottleneck, and propose a simple yet effective baseline for MTS forecasting.
These results suggest that we can design efficient and effective models as long as they solve the indistinguishability of samples, without being limited to STGNNs.
arXiv Detail & Related papers (2022-08-10T09:25:43Z) - ETSformer: Exponential Smoothing Transformers for Time-series
Forecasting [35.76867542099019]
We propose ETSFormer, a novel time-series Transformer architecture, which exploits the principle of exponential smoothing in improving Transformers for time-series forecasting.
In particular, inspired by the classical exponential smoothing methods in time-series forecasting, we propose the novel exponential smoothing attention (ESA) and frequency attention (FA) to replace the self-attention mechanism in vanilla Transformers, thus improving both accuracy and efficiency.
arXiv Detail & Related papers (2022-02-03T02:50:44Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.