Related papers: SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series Forecasting

SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series Forecasting

URL: http://arxiv.org/abs/2404.14757v2
Date: Thu, 22 Aug 2024 17:55:42 GMT
Title: SST: Multi-Scale Hybrid Mamba-Transformer Experts for Long-Short Range Time Series Forecasting
Authors: Xiongxiao Xu, Canyu Chen, Yueqing Liang, Baixiang Huang, Guangji Bai, Liang Zhao, Kai Shu,
Abstract summary: Time series can be decomposed into global patterns and local variations. We propose a hybrid Mamba-Transformer experts model State Space Transformer (SST) SST can achieve SOTA results in long-short range time series forecasting while maintaining low memory footprint and computational cost.
Score: 20.87352811680857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite significant progress in time series forecasting, existing forecasters often overlook the heterogeneity between long-range and short-range time series, leading to performance degradation in practical applications. In this work, we highlight the need of distinct objectives tailored to different ranges. We point out that time series can be decomposed into global patterns and local variations, which should be addressed separately in long- and short-range time series. To meet the objectives, we propose a multi-scale hybrid Mamba-Transformer experts model State Space Transformer (SST). SST leverages Mamba as an expert to extract global patterns in coarse-grained long-range time series, and Local Window Transformer (LWT), the other expert to focus on capturing local variations in fine-grained short-range time series. With an input-dependent mechanism, State Space Model (SSM)-based Mamba is able to selectively retain long-term patterns and filter out fluctuations, while LWT employs a local window to enhance locality-awareness capability, thus effectively capturing local variations. To adaptively integrate the global patterns and local variations, a long-short router dynamically adjusts contributions of the two experts. SST achieves superior performance with scaling linearly $O(L)$ on time series length $L$. The comprehensive experiments demonstrate the SST can achieve SOTA results in long-short range time series forecasting while maintaining low memory footprint and computational cost. The code of SST is available at https://github.com/XiongxiaoXu/SST.

Related papers

TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state [12.940694192516059]
In long-term time series forecasting, different variables often influence the target variable over distinct time intervals.<n>Traditional models typically process all variables or time points uniformly, which limits their ability to capture complex variable relationships.<n>We propose TimePro, an innovative Mamba-based model that constructs variate- and time-aware hyper-states.
arXiv Detail & Related papers (2025-05-27T06:24:21Z)
LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics [56.99021951927683]
Time Series Forecasting (TSF) is critical in many real-world domains like financial planning and health monitoring. Existing Large Language Models (LLMs) usually perform suboptimally because they neglect the inherent characteristics of time series data. We propose LLM-PS to empower the LLM for TSF by learning the fundamental textitPatterns and meaningful textitSemantics from time series data.
arXiv Detail & Related papers (2025-03-12T11:45:11Z)
S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting [31.19126944008011]
Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. We propose State Space Transformer with cross-attention (S2TX) to address these concerns. S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.
arXiv Detail & Related papers (2025-02-17T01:40:45Z)
UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba [7.594115034632109]
We propose UmambaTSF, a novel long-term time series forecasting framework. It integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation. UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets.
arXiv Detail & Related papers (2024-10-15T04:56:43Z)
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts [103.725112190618]
This paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts. Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
arXiv Detail & Related papers (2024-10-14T13:01:11Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
Oscillatory State-Space Models [61.923849241099184]
We propose Lineary State-Space models (LinOSS) for efficiently learning on long sequences.<n>A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model.<n>We show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions.
arXiv Detail & Related papers (2024-10-04T22:00:13Z)
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K Parameters [6.733646592789575]
Long-term Time Series Forecasting (LTSF) involves predicting long-term values by analyzing a large amount of historical time-series data to identify patterns and trends. Transformer-based models offer high forecasting accuracy, but they are often too compute-intensive to be deployed on devices with hardware constraints. We propose MixLinear, an ultra-lightweight time series forecasting model specifically designed for resource-constrained devices.
arXiv Detail & Related papers (2024-10-02T23:04:57Z)
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics [7.745945701278489]
Long-short range time series forecasting is essential for predicting future trends and patterns over extended periods. Deep learning models such as Transformers have made significant strides in advancing time series forecasting. This article examines the advantages and disadvantages of both Mamba and Transformer models.
arXiv Detail & Related papers (2024-09-13T04:23:54Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification [13.110156202816112]
We propose a novel multi-view approach to capture patterns with properties like shift equivariance. Our method integrates diverse features, including spectral, temporal, local, and global features, to obtain rich, complementary contexts for TSC. Our approach achieves average accuracy improvements of 4.01-6.45% and 7.93% respectively, over leading TSC models.
arXiv Detail & Related papers (2024-06-06T18:05:10Z)
MambaTS: Improved Selective State Space Models for Long-term Time Series Forecasting [12.08746904573603]
Mamba, based on selective state space models (SSMs), has emerged as a competitive alternative to Transformer. We propose four targeted improvements, leading to MambaTS. Experiments conducted on eight public datasets demonstrate that MambaTS achieves new state-of-the-art performance.
arXiv Detail & Related papers (2024-05-26T05:50:17Z)
TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting [13.110156202816112]
TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales. TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets.
arXiv Detail & Related papers (2024-03-14T22:19:37Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Grouped self-attention mechanism for a memory-efficient Transformer [64.0125322353281]
Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time. Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time. We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA) Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
arXiv Detail & Related papers (2022-10-02T06:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.