STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction
- URL: http://arxiv.org/abs/2508.12247v1
- Date: Sun, 17 Aug 2025 05:29:58 GMT
- Title: STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction
- Authors: Haolong Chen, Liang Zhang, Zhengyuan Xin, Guangxu Zhu,
- Abstract summary: Long-term-temporal time-series has developed rapidly, yet existing deep learning methods struggle with learning complex long-term-temporal dependencies efficiently.<n>In this paper, we propose an efficient textittextbfSTemporal textbfMultiscale textbfMamba (STM2) that includes a multiscale Mamba architecture and an adaptive graph causal convolution network to learn the complex multiscale-temporal dependency.
- Score: 12.810918443757382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, spatio-temporal time-series prediction has developed rapidly, yet existing deep learning methods struggle with learning complex long-term spatio-temporal dependencies efficiently. The long-term spatio-temporal dependency learning brings two new challenges: 1) The long-term temporal sequence includes multiscale information naturally which is hard to extract efficiently; 2) The multiscale temporal information from different nodes is highly correlated and hard to model. To address these challenges, we propose an efficient \textit{\textbf{S}patio-\textbf{T}emporal \textbf{M}ultiscale \textbf{M}amba} (STM2) that includes a multiscale Mamba architecture to capture the multiscale information efficiently and simultaneously, and an adaptive graph causal convolution network to learn the complex multiscale spatio-temporal dependency. STM2 includes hierarchical information aggregation for different-scale information that guarantees their distinguishability. To capture diverse temporal dynamics across all spatial nodes more efficiently, we further propose an enhanced version termed \textit{\textbf{S}patio-\textbf{T}emporal \textbf{M}ixture of \textbf{M}ultiscale \textbf{M}amba} (STM3) that employs a special Mixture-of-Experts architecture, including a more stable routing strategy and a causal contrastive learning strategy to enhance the scale distinguishability. We prove that STM3 has much better routing smoothness and guarantees the pattern disentanglement for each expert successfully. Extensive experiments on real-world benchmarks demonstrate STM2/STM3's superior performance, achieving state-of-the-art results in long-term spatio-temporal time-series prediction.
Related papers
- DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis [22.768341734517815]
Transformer-based models suffer from computational complexity and high memory overhead.<n>Mamba has emerged as a promising linear-time alternative with high expressiveness.<n>DeMa is a dual-path delay-aware Mamba backbone.
arXiv Detail & Related papers (2026-01-09T04:54:56Z) - Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction [53.555201955973104]
Comprehensively flexibly capturing the complex-temporal dependencies of human motion is critical for multi-person motion.<n>Existing methods grapple with two primary limitations.<n>High computational costs stemming from time of conventional attention.<n>Our model incorporates four distinct types oftemporal experts, each specializing in capturing different spatial or temporal dependencies.
arXiv Detail & Related papers (2025-12-25T15:01:19Z) - PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching [51.98089287914147]
textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.<n>Inspired by the two-stage decision-making process in humans, we propose a textbfPick-and-textbflay textbfMemory (PM) construction module for dynamic bfStereo matching, dubbed as bftextPPMStereo.
arXiv Detail & Related papers (2025-10-23T03:52:39Z) - MGTS-Net: Exploring Graph-Enhanced Multimodal Fusion for Augmented Time Series Forecasting [1.7077661158850292]
We propose MGTS-Net, a Multimodal Graph-enhanced Network for Time Series forecasting.<n>The model consists of three core components: (1) a Multimodal Feature Extraction layer (MFE), (2) a Multimodal Feature Fusion layer (MFF), and (3) a Multi-Scale Prediction layer (MSP)
arXiv Detail & Related papers (2025-10-18T04:47:10Z) - StoxLSTM: A Stochastic Extended Long Short-Term Memory Network for Time Series Forecasting [20.120876019697445]
Extended Long Short-Term Memory (xLSTM) network has attracted widespread research interest due to its enhanced capability to model complex temporal dependencies in diverse time series applications.<n>We propose a xLSTM, termed StoxLSTM, that improves the original architecture into a state space modeling framework by incorporating latent variables within xLSTM.<n>Experiments on publicly available benchmark datasets from multiple research communities demonstrate that StoxLSTM consistently outperforms state-of-the-art baselines with better robustness and stronger generalization ability.
arXiv Detail & Related papers (2025-09-01T07:11:05Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - Multi-granular body modeling with Redundancy-Free Spatiotemporal Fusion for Text-Driven Motion Generation [10.843503146808839]
We introduce HiSTF Mamba, a framework with three parts: Dual-tial Mamba, Bi-Temporal Mamba and a Spatiotemporal Fusion Module (DSFM)<n>Experiments on the HumanML3D benchmark show that HiSTF Mamba performs well across several metrics, achieving high fidelity and tight semantic alignment between text and motion.
arXiv Detail & Related papers (2025-03-10T04:01:48Z) - MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection [11.534493974662304]
Temporal Action Detection (TAD) in untrimmed videos requires models that can efficiently process long-duration videos.<n>We propose Multi-Scale Temporal Mamba (MS-Temba), the first Mamba-based architecture specifically designed for densely labeled TAD tasks.<n>MS-Temba achieves state-of-the-art performance on long-duration videos, remains competitive on shorter segments, and reduces model complexity by 88%.
arXiv Detail & Related papers (2025-01-10T17:52:47Z) - UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba [7.594115034632109]
We propose UmambaTSF, a novel long-term time series forecasting framework.
It integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation.
UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets.
arXiv Detail & Related papers (2024-10-15T04:56:43Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction [8.558852563471525]
We propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM)
Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatial-scale information by a multi-temporal SSM.
Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as temporal auxiliary information.
arXiv Detail & Related papers (2024-07-05T04:09:30Z) - Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting [26.141054975797868]
We propose a novel Adaptive Multi-Scale Decomposition (AMD) framework for time series forecasting.<n>Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block.<n>Our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration.
arXiv Detail & Related papers (2024-06-06T05:27:33Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - Multivariate Time Series Forecasting with Dynamic Graph Neural ODEs [65.18780403244178]
We propose a continuous model to forecast Multivariate Time series with dynamic Graph neural Ordinary Differential Equations (MTGODE)
Specifically, we first abstract multivariate time series into dynamic graphs with time-evolving node features and unknown graph structures.
Then, we design and solve a neural ODE to complement missing graph topologies and unify both spatial and temporal message passing.
arXiv Detail & Related papers (2022-02-17T02:17:31Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.