Related papers: DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting

DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting

URL: http://arxiv.org/abs/2509.24800v1
Date: Mon, 29 Sep 2025 13:50:56 GMT
Title: DSAT-HD: Dual-Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting
Authors: Zixu Wang, Hongbin Dong, Xiaoping Zhang,
Abstract summary: Time series forecasting is crucial for various applications, such as weather, traffic, electricity, and energy predictions.<n>Existing approaches primarily model limited time series or fixed scales, making it more challenging to capture diverse features cross different ranges.<n>We propose the Hybrid Decomposition Dual-Stream Adaptive Transformer (DSAT-HD), which integrates three key innovations to address the limitations of existing methods.
Score: 14.708544628811381
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Time series forecasting is crucial for various applications, such as weather, traffic, electricity, and energy predictions. Currently, common time series forecasting methods are based on Transformers. However, existing approaches primarily model limited time series or fixed scales, making it more challenging to capture diverse features cross different ranges. Additionally, traditional methods like STL for complex seasonality-trend decomposition require pre-specified seasonal periods and typically handle only single, fixed seasonality. We propose the Hybrid Decomposition Dual-Stream Adaptive Transformer (DSAT-HD), which integrates three key innovations to address the limitations of existing methods: 1) A hybrid decomposition mechanism combining EMA and Fourier decomposition with RevIN normalization, dynamically balancing seasonal and trend components through noise Top-k gating; 2) A multi-scale adaptive pathway leveraging a sparse allocator to route features to four parallel Transformer layers, followed by feature merging via a sparse combiner, enhanced by hybrid attention combining local CNNs and global interactions; 3) A dual-stream residual learning framework where CNN and MLP branches separately process seasonal and trend components, coordinated by a balanced loss function minimizing expert collaboration variance. Extensive experiments on nine datasets demonstrate that DSAT-HD outperforms existing methods overall and achieves state-of-the-art performance on some datasets. Notably, it also exhibits stronger generalization capabilities across various transfer scenarios.

Related papers

AltTS: A Dual-Path Framework with Alternating Optimization for Multivariate Time Series Forecasting [27.971282358985604]
We propose ALTTS, a dual-path framework that explicitly decouples autoregression and cross-relation modeling.<n>We show that ALTTS consistently outperforms prior methods, with the most pronounced improvements on long-horizon forecasting.
arXiv Detail & Related papers (2026-02-12T03:45:00Z)
DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
MoHETS: Long-term Time Series Forecasting with Mixture-of-Heterogeneous-Experts [0.8292000624465587]
Real-world time series can exhibit intricate multi-scale structures, including global trends, local periodicities, and non-stationary regimes.<n>MoHETS integrates sparse Mixture-of-Heterogeneous-Experts layers.<n>We replace parameter-heavy linear projection heads with a lightweight convolutional patch decoder.
arXiv Detail & Related papers (2026-01-29T15:35:26Z)
ScatterFusion: A Hierarchical Scattering Transform Framework for Enhanced Time Series Forecasting [3.453296006042559]
Time series forecasting presents significant challenges due to the complex temporal dependencies at multiple time scales.<n>This paper introduces ScatterFusion, a novel framework that integrates scattering transforms with hierarchical attention mechanisms for robust time series forecasting.
arXiv Detail & Related papers (2026-01-28T09:06:01Z)
AdaMixT: Adaptive Weighted Mixture of Multi-Scale Expert Transformers for Time Series Forecasting [15.522567372502762]
We propose a novel architecture named Adaptive Weighted Mixture of Multi-Scale Expert Transformers (AdaMixT)<n>AdaMixT introduces various patches and leverages both General Pre-trained Models (GPM) and Domain-specific Models (DSM) for multi-scale feature extraction.<n> Comprehensive experiments on eight widely used benchmarks, including Weather, Traffic, Electricity, ILI, and four ETT datasets, consistently demonstrate the effectiveness of AdaMixT.
arXiv Detail & Related papers (2025-09-09T15:30:53Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
PFformer: A Position-Free Transformer Variant for Extreme-Adaptive Multivariate Time Series Forecasting [9.511600544581425]
PFformer is a position-free Transformer-based model designed for single-target MTS forecasting.<n> PFformer integrates two novel embedding strategies: Enhanced Feature-based Embedding (EFE) and Auto-Encoder-based Embedding (AEE)
arXiv Detail & Related papers (2025-02-27T22:21:27Z)
Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting [10.32586981170693]
Inverted Seasonal-Trend Decomposition Transformer (Ister)<n>We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy.<n>Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.
arXiv Detail & Related papers (2024-12-25T06:37:19Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network. It simultaneously considers correlations at three levels to enhance prediction performance. It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z)
TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS) We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting [4.666618110838523]
Time series forecasting is widely used in the fields of equipment life cycle forecasting, weather forecasting, traffic flow forecasting, and other fields. Some scholars have tried to apply Transformer to time series forecasting because of its powerful parallel training ability. The existing Transformer methods do not pay enough attention to the small time segments that play a decisive role in prediction.
arXiv Detail & Related papers (2022-02-23T10:33:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.