CITRAS: Covariate-Informed Transformer for Time Series Forecasting
- URL: http://arxiv.org/abs/2503.24007v1
- Date: Mon, 31 Mar 2025 12:32:23 GMT
- Title: CITRAS: Covariate-Informed Transformer for Time Series Forecasting
- Authors: Yosuke Yamaguchi, Issei Suemitsu, Wenpeng Wei,
- Abstract summary: CITRAS is a patch-based Transformer that flexibly leverages multiple targets and covariates covering both the past and the future horizon.<n>It achieves state-of-the-art performance in both covariate-informed and multivariate forecasting.
- Score: 0.49157446832511503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Covariates play an indispensable role in practical time series forecasting, offering rich context from the past and sometimes extending into the future. However, their availability varies depending on the scenario, and situations often involve multiple target variables simultaneously. Moreover, the cross-variate dependencies between them are multi-granular, with some covariates having a short-term impact on target variables and others showing long-term correlations. This heterogeneity and the intricate dependencies arising in covariate-informed forecasting present significant challenges to existing deep models. To address these issues, we propose CITRAS, a patch-based Transformer that flexibly leverages multiple targets and covariates covering both the past and the future forecasting horizon. While preserving the strong autoregressive capabilities of the canonical Transformer, CITRAS introduces two novel mechanisms in patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates future known covariates into the forecasting of target variables based on their concurrent dependencies. Additionally, Attention Score Smoothing transforms locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the past series of attention scores. Experimentally, CITRAS achieves state-of-the-art performance in both covariate-informed and multivariate forecasting, demonstrating its versatile ability to leverage cross-variate dependency for improved forecasting accuracy.
Related papers
- Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations [2.2091590689610823]
We re-purpose the Transformer architecture to model both cross-time and cross-variate dependencies.
Our method achieves state-of-the-art performance across 13 real-world datasets, delivering performance improvements up to 20.7% over original models.
arXiv Detail & Related papers (2025-05-01T04:59:05Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting [49.6208017412376]
TimeBridge is a novel framework designed to bridge the gap between non-stationarity and dependency modeling.
TimeBridge consistently achieves state-of-the-art performance in both short-term and long-term forecasting.
arXiv Detail & Related papers (2024-10-06T10:41:03Z) - DLFormer: Enhancing Explainability in Multivariate Time Series Forecasting using Distributed Lag Embedding [4.995397953581609]
This study introduces DLFormer, an attention-based architecture integrated with distributed lag embedding.
It showcases superior performance improvements compared to existing attention-based high-performance models.
arXiv Detail & Related papers (2024-08-29T20:39:54Z) - VCformer: Variable Correlation Transformer with Inherent Lagged Correlation for Multivariate Time Series Forecasting [1.5165632546654102]
We propose Variable Correlation Transformer (VCformer) to mine the correlations among variables.
VCA calculates and integrates the cross-correlation scores corresponding to different lags between queries and keys.
Inspired by Koopman dynamics theory, we also develop Koopman Temporal Detector (KTD) to better address the non-stationarity in time series.
arXiv Detail & Related papers (2024-05-19T07:39:22Z) - TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables [75.83318701911274]
TimeXer ingests external information to enhance the forecasting of endogenous variables.
TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks.
arXiv Detail & Related papers (2024-02-29T11:54:35Z) - Cross-LKTCN: Modern Convolution Utilizing Cross-Variable Dependency for
Multivariate Time Series Forecasting Dependency for Multivariate Time Series
Forecasting [9.433527676880903]
Key to accurate forecasting results is capturing the long-term dependency between each time step.
Recent methods mainly focus on the cross-time dependency but seldom consider the cross-variable dependency.
We propose a modern pure convolution structure, namely Cross-LKTCN, to better utilize both cross-time and cross-variable dependency.
arXiv Detail & Related papers (2023-06-04T10:50:52Z) - Copula Variational LSTM for High-dimensional Cross-market Multivariate
Dependence Modeling [46.75628526959982]
We make the first attempt to integrate variational sequential neural learning with copula-based dependence modeling.
Our variational neural network WPVC-VLSTM models variational sequential dependence degrees and structures across time series.
It outperforms benchmarks including linear models, volatility models, deep neural networks, and variational recurrent networks in cross-market portfolio forecasting.
arXiv Detail & Related papers (2023-05-09T08:19:08Z) - Unleashing the Power of Graph Data Augmentation on Covariate
Distribution Shift [50.98086766507025]
We propose a simple-yet-effective data augmentation strategy, Adversarial Invariant Augmentation (AIA)
AIA aims to extrapolate and generate new environments, while concurrently preserving the original stable features during the augmentation process.
arXiv Detail & Related papers (2022-11-05T07:55:55Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.