MTS-Mixers: Multivariate Time Series Forecasting via Factorized Temporal
and Channel Mixing
- URL: http://arxiv.org/abs/2302.04501v1
- Date: Thu, 9 Feb 2023 08:52:49 GMT
- Title: MTS-Mixers: Multivariate Time Series Forecasting via Factorized Temporal
and Channel Mixing
- Authors: Zhe Li, Zhongwen Rao, Lujia Pan, Zenglin Xu
- Abstract summary: This paper investigates the contributions and deficiencies of attention mechanisms on the performance of time series forecasting.
We propose MTS-Mixers, which use two factorized modules to capture temporal and channel dependencies.
Experimental results on several real-world datasets show that MTS-Mixers outperform existing Transformer-based models with higher efficiency.
- Score: 18.058617044421293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multivariate time series forecasting has been widely used in various
practical scenarios. Recently, Transformer-based models have shown significant
potential in forecasting tasks due to the capture of long-range dependencies.
However, recent studies in the vision and NLP fields show that the role of
attention modules is not clear, which can be replaced by other token
aggregation operations. This paper investigates the contributions and
deficiencies of attention mechanisms on the performance of time series
forecasting. Specifically, we find that (1) attention is not necessary for
capturing temporal dependencies, (2) the entanglement and redundancy in the
capture of temporal and channel interaction affect the forecasting performance,
and (3) it is important to model the mapping between the input and the
prediction sequence. To this end, we propose MTS-Mixers, which use two
factorized modules to capture temporal and channel dependencies. Experimental
results on several real-world datasets show that MTS-Mixers outperform existing
Transformer-based models with higher efficiency.
Related papers
- Robust Multivariate Time Series Forecasting against Intra- and Inter-Series Transitional Shift [40.734564394464556]
We present a unified Probabilistic Graphical Model to Jointly capturing intra-/inter-series correlations and modeling the time-variant transitional distribution.
We validate the effectiveness and efficiency of JointPGM through extensive experiments on six highly non-stationary MTS datasets.
arXiv Detail & Related papers (2024-07-18T06:16:03Z) - UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network.
It simultaneously considers correlations at three levels to enhance prediction performance.
It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z) - HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting [17.70984737213973]
HiMTM is a hierarchical multi-scale masked time series modeling with self-distillation for long-term forecasting.
HiMTM integrates four key components: (1) hierarchical multi-scale transformer (HMT) to capture temporal information at different scales; (2) decoupled encoder-decoder (DED) that directs the encoder towards feature extraction while the decoder focuses on pretext tasks.
Experiments on seven mainstream datasets show that HiMTM surpasses state-of-the-art self-supervised and end-to-end learning methods by a considerable margin of 3.16-68.54%.
arXiv Detail & Related papers (2024-01-10T09:00:03Z) - Perceiver-based CDF Modeling for Time Series Forecasting [25.26713741799865]
We propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data.
Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction.
Experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods.
arXiv Detail & Related papers (2023-10-03T01:13:17Z) - Multi-scale Transformer Pyramid Networks for Multivariate Time Series
Forecasting [8.739572744117634]
We introduce a dimension invariant embedding technique that captures short-term temporal dependencies.
We present a novel Multi-scale Transformer Pyramid Network (MTPNet) specifically designed to capture temporal dependencies at multiple unconstrained scales.
arXiv Detail & Related papers (2023-08-23T06:40:05Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - The Capacity and Robustness Trade-off: Revisiting the Channel
Independent Strategy for Multivariate Time Series Forecasting [50.48888534815361]
We show that models trained with the Channel Independent (CI) strategy outperform those trained with the Channel Dependent (CD) strategy.
Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series.
We propose a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy.
arXiv Detail & Related papers (2023-04-11T13:15:33Z) - DRAformer: Differentially Reconstructed Attention Transformer for
Time-Series Forecasting [7.805077630467324]
Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting.
It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting.
However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks.
arXiv Detail & Related papers (2022-06-11T10:34:29Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.