Related papers: Transformer Multivariate Forecasting: Less is More?

Transformer Multivariate Forecasting: Less is More?

URL: http://arxiv.org/abs/2401.00230v2
Date: Thu, 7 Mar 2024 10:25:20 GMT
Title: Transformer Multivariate Forecasting: Less is More?
Authors: Jingjing Xu, Caesar Wu, Yuan-Fang Li, Pascal Bouvry
Abstract summary: The paper focuses on reducing redundant information to elevate forecasting accuracy while optimizing runtime efficiency. The framework is evaluated by five state-of-the-art (SOTA) models and four diverse real-world datasets. From the model perspective, one of the PCA-enhanced models: PCA+Crossformer, reduces mean square errors (MSE) by 33.3% and decreases runtime by 49.2% on average.
Score: 42.558736426375056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the domain of multivariate forecasting, transformer models stand out as powerful apparatus, displaying exceptional capabilities in handling messy datasets from real-world contexts. However, the inherent complexity of these datasets, characterized by numerous variables and lengthy temporal sequences, poses challenges, including increased noise and extended model runtime. This paper focuses on reducing redundant information to elevate forecasting accuracy while optimizing runtime efficiency. We propose a novel transformer forecasting framework enhanced by Principal Component Analysis (PCA) to tackle this challenge. The framework is evaluated by five state-of-the-art (SOTA) models and four diverse real-world datasets. Our experimental results demonstrate the framework's ability to minimize prediction errors across all models and datasets while significantly reducing runtime. From the model perspective, one of the PCA-enhanced models: PCA+Crossformer, reduces mean square errors (MSE) by 33.3% and decreases runtime by 49.2% on average. From the dataset perspective, the framework delivers 14.3% MSE and 76.6% runtime reduction on Electricity datasets, as well as 4.8% MSE and 86.9% runtime reduction on Traffic datasets. This study aims to advance various SOTA models and enhance transformer-based time series forecasting for intricate data. Code is available at: https://github.com/jingjing-unilu/PCA_Transformer.

Related papers

PFformer: A Position-Free Transformer Variant for Extreme-Adaptive Multivariate Time Series Forecasting [9.511600544581425]
PFformer is a position-free Transformer-based model designed for single-target MTS forecasting. PFformer integrates two novel embedding strategies: Enhanced Feature-based Embedding (EFE) and Auto-Encoder-based Embedding (AEE)
arXiv Detail & Related papers (2025-02-27T22:21:27Z)
Less is more: Embracing sparsity and interpolation with Esiformer for time series forecasting [19.8447763392479]
Time series data generated from real-world applications always exhibits high variance and lots of noise. We propose the Esiformer, which apply on the original data, decreasing the overall variance of the data and alleviating the influence of noise. Our method outperforms leading model PatchTST, reducing MSE by 6.5% and MAE by 5.8%.
arXiv Detail & Related papers (2024-10-08T06:45:47Z)
Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning. By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches. We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS) Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
arXiv Detail & Related papers (2024-05-27T06:49:39Z)
Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z)
A Transformer-based Framework For Multi-variate Time Series: A Remaining Useful Life Prediction Use Case [4.0466311968093365]
This work proposed an encoder-transformer architecture-based framework for time series prediction. We validated the effectiveness of the proposed framework on all four sets of the C-MAPPS benchmark dataset. To enable the model awareness of the initial stages of the machine life and its degradation path, a novel expanding window method was proposed.
arXiv Detail & Related papers (2023-08-19T02:30:35Z)
A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset. We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
Towards Data-Efficient Detection Transformers [77.43470797296906]
We show most detection transformers suffer from significant performance drops on small-size datasets. We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
arXiv Detail & Related papers (2022-03-17T17:56:34Z)
Towards physically consistent data-driven weather forecasting: Integrating data assimilation with equivariance-preserving deep spatial transformers [2.7998963147546148]
We propose 3 components to integrate with commonly used data-driven weather prediction models. These components are 1) a deep spatial transformer added to latent space of U-NETs to preserve equivariance, 2) a data-assimilation algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and 3) a multi-time-step algorithm, improving the accuracy of forecasts at short intervals.
arXiv Detail & Related papers (2021-03-16T23:15:00Z)
Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies. THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.