Related papers: Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

Scalable Transformer for High Dimensional Multivariate Time Series Forecasting

URL: http://arxiv.org/abs/2408.04245v1
Date: Thu, 8 Aug 2024 06:17:13 GMT
Title: Scalable Transformer for High Dimensional Multivariate Time Series Forecasting
Authors: Xin Zhou, Weiqing Wang, Wray Buntine, Shilin Qu, Abishek Sriramulu, Weicong Tan, Christoph Bergmeir,
Abstract summary: This study investigates the reasons behind the suboptimal performance of channel-dependent models on high-dimensional MTS data. We propose STHD, the Scalable Transformer for High-Dimensional Multidimensional Time Series Forecasting. Experiments show STHD's considerable improvement on three high-dimensional datasets: Crime-Chicago, Wiki-People, and Traffic.
Score: 10.17270031004674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep models for Multivariate Time Series (MTS) forecasting have recently demonstrated significant success. Channel-dependent models capture complex dependencies that channel-independent models cannot capture. However, the number of channels in real-world applications outpaces the capabilities of existing channel-dependent models, and contrary to common expectations, some models underperform the channel-independent models in handling high-dimensional data, which raises questions about the performance of channel-dependent models. To address this, our study first investigates the reasons behind the suboptimal performance of these channel-dependent models on high-dimensional MTS data. Our analysis reveals that two primary issues lie in the introduced noise from unrelated series that increases the difficulty of capturing the crucial inter-channel dependencies, and challenges in training strategies due to high-dimensional data. To address these issues, we propose STHD, the Scalable Transformer for High-Dimensional Multivariate Time Series Forecasting. STHD has three components: a) Relation Matrix Sparsity that limits the noise introduced and alleviates the memory issue; b) ReIndex applied as a training strategy to enable a more flexible batch size setting and increase the diversity of training data; and c) Transformer that handles 2-D inputs and captures channel dependencies. These components jointly enable STHD to manage the high-dimensional MTS while maintaining computational feasibility. Furthermore, experimental results show STHD's considerable improvement on three high-dimensional datasets: Crime-Chicago, Wiki-People, and Traffic. The source code and dataset are publicly available https://github.com/xinzzzhou/ScalableTransformer4HighDimensionMTSF.git.

Related papers

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation [14.903360987684483]
We propose FEAT, a full-dimensional efficient attention Transformer for high-quality dynamic medical videos.<n>We evaluate FEAT on standard benchmarks and downstream tasks, demonstrating that FEAT-S, with only 23% of the parameters of the state-of-the-art model Endora, achieves comparable or even superior performance.
arXiv Detail & Related papers (2025-06-05T12:31:02Z)
DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor [22.35724335601674]
Video Quality Assessment (VQA) aims to evaluate video quality based on perceptual distortions and human preferences.<n>We introduce a novel VQA framework, DiffVQA, which harnesses the robust generalization capabilities of diffusion models pre-trained on extensive datasets.
arXiv Detail & Related papers (2025-05-06T07:42:24Z)
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer [56.98400572837792]
DiVE produces high-fidelity, temporally coherent, and cross-view consistent multi-view videos. These innovations collectively achieve a 2.62x speedup with minimal quality degradation.
arXiv Detail & Related papers (2025-04-28T09:20:50Z)
VISTA: Unsupervised 2D Temporal Dependency Representations for Time Series Anomaly Detection [42.694234312755285]
Time Series Anomaly Detection (TSAD) is essential for uncovering rare and potentially harmful events in unlabeled time series data. We introduce VISTA, a training-free, unsupervised TSAD algorithm designed to overcome these challenges.
arXiv Detail & Related papers (2025-04-03T11:20:49Z)
Beyond Fixed Variables: Expanding-variate Time Series Forecasting via Flat Scheme and Spatio-temporal Focal Learning [9.205228068704141]
In real-world applications, Cyber-Physical Systems often expand as new sensors are, increasing variables in MTSF. This task presents unique challenges, specifically (1) handling inconsistent data caused by adding new variables, and (2) addressing imbalanced-temporal learning. To address these challenges, we propose STEV, a flexible-temporal forecasting framework.
arXiv Detail & Related papers (2025-02-21T08:43:26Z)
Partial Channel Dependence with Channel Masks for Time Series Foundation Models [5.752266579415516]
We introduce the concept of partial channel dependence (PCD), which enables a more sophisticated adjustment of channel dependencies based on dataset-specific information. We validate the effectiveness of PCD across four tasks in time series (TS) including forecasting, classification, imputation, and anomaly detection.
arXiv Detail & Related papers (2024-10-30T17:12:03Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
PDMLP: Patch-based Decomposed MLP for Long-Term Time Series Forecasting [0.0]
Recent studies have attempted to refine the Transformer architecture to demonstrate its effectiveness in Long-Term Time Series Forecasting (LTSF) tasks. We attribute the effectiveness of these models largely to the adopted Patch mechanism, which enhances sequence locality. Further investigation suggests that simple linear layers augmented with the Patch mechanism may outperform complex Transformer-based LTSF models.
arXiv Detail & Related papers (2024-05-22T12:12:20Z)
Considering Nonstationary within Multivariate Time Series with Variational Hierarchical Transformer for Forecasting [12.793705636683402]
We develop a powerful hierarchical probabilistic generative module to consider the non-stationarity and intrinsic characteristics within MTS. We then combine it with transformer for a well-defined variational generative dynamic model named Hierarchical Time series Variational Transformer (HTV-Trans) Being a powerful probabilistic model, HTV-Trans is utilized to learn expressive representations of MTS and applied to forecasting tasks.
arXiv Detail & Related papers (2024-03-08T16:04:36Z)
CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting [3.6814181034608664]
We propose a strategy of channel independence followed by mixing in time series analysis. We introduce CSformer, a novel framework featuring a two-stage multiheaded self-attention mechanism. Our framework effectively incorporates sequence and channel adapters, significantly improving the model's ability to identify important information.
arXiv Detail & Related papers (2023-12-11T09:10:38Z)
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis [94.10861578387443]
We explore the inference process of two mainstream T2V models using transformers and diffusion models. We propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights. Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning.
arXiv Detail & Related papers (2023-12-06T12:34:47Z)
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism. The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders. Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets [91.25055890980084]
There still remains an extreme performance gap between Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) when training from scratch on small datasets. We propose Dynamic Hybrid Vision Transformer (DHVT) as the solution to enhance the two inductive biases. Our DHVT achieves a series of state-of-the-art performance with a lightweight model, 85.68% on CIFAR-100 with 22.8M parameters, 82.3% on ImageNet-1K with 24.0M parameters.
arXiv Detail & Related papers (2022-10-12T06:54:39Z)
DBT-DMAE: An Effective Multivariate Time Series Pre-Train Model under Missing Data [16.589715330897906]
MTS suffers from missing data problems, which leads to degradation or collapse of the downstream tasks. This paper presents a universally applicable MTS pre-train model,. -DMAE, to conquer the abovementioned obstacle.
arXiv Detail & Related papers (2022-09-16T08:54:02Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.