Multi-modal Spatio-Temporal Transformer for High-resolution Land Subsidence Prediction
- URL: http://arxiv.org/abs/2509.25393v2
- Date: Wed, 01 Oct 2025 11:00:59 GMT
- Title: Multi-modal Spatio-Temporal Transformer for High-resolution Land Subsidence Prediction
- Authors: Wendong Yao, Binhua Huang, Soumyabrata Dev,
- Abstract summary: We propose a novel framework that fuses dynamic displacement data with static physical priors.<n>On the public EGMS dataset, MM-STT establishes a new state-of-the-art, reducing the long-range forecast RMSE by an order of high magnitude.
- Score: 3.3295066998131637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Forecasting high-resolution land subsidence is a critical yet challenging task due to its complex, non-linear dynamics. While standard architectures like ConvLSTM often fail to model long-range dependencies, we argue that a more fundamental limitation of prior work lies in the uni-modal data paradigm. To address this, we propose the Multi-Modal Spatio-Temporal Transformer (MM-STT), a novel framework that fuses dynamic displacement data with static physical priors. Its core innovation is a joint spatio-temporal attention mechanism that processes all multi-modal features in a unified manner. On the public EGMS dataset, MM-STT establishes a new state-of-the-art, reducing the long-range forecast RMSE by an order of magnitude compared to all baselines, including SOTA methods like STGCN and STAEformer. Our results demonstrate that for this class of problems, an architecture's inherent capacity for deep multi-modal fusion is paramount for achieving transformative performance.
Related papers
- A Decomposition-based State Space Model for Multivariate Time-Series Forecasting [0.0]
We propose an end-to-end decomposition framework using three parallel deep state space model branches to capture trend, seasonal, and residual components.<n>Across standard benchmarks, DecompSSM outperformed strong baselines, indicating the effectiveness of combining component-wise deep state space models and global context refinement.
arXiv Detail & Related papers (2026-02-05T07:17:08Z) - SpanNorm: Reconciling Training Stability and Performance in Deep Transformers [55.100133502295996]
We propose SpanNorm, a novel technique designed to resolve the dilemma by integrating the strengths of both paradigms.<n>We provide a theoretical analysis demonstrating that SpanNorm, combined with a principled scaling strategy, maintains bounded signal variance throughout the network.<n> Empirically, SpanNorm consistently outperforms standard normalization schemes in both dense and Mixture-of-Experts (MoE) scenarios.
arXiv Detail & Related papers (2026-01-30T05:21:57Z) - AR-MOT: Autoregressive Multi-object Tracking [56.09738000988466]
We propose a novel autoregressive paradigm that formulates MOT as a sequence generation task within a large language model (LLM) framework.<n>This design enables the model to output structured results through flexible sequence construction, without requiring any task-specific heads.<n>To enhance region-level visual perception, we introduce an Object Tokenizer based on a pretrained detector.
arXiv Detail & Related papers (2026-01-05T09:17:28Z) - Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems [38.4555621948915]
Prismatic World Model (PRISM-WM) is designed to decompose complex hybrid dynamics into composable primitives.<n>PRISM-WM significantly reduces rollout drift by accurately modeling sharp mode transitions in system dynamics.
arXiv Detail & Related papers (2025-12-09T09:40:34Z) - MSTN: Fast and Efficient Multivariate Time Series Model [0.0]
We introduce the Multi-scale Temporal Network (MSTN), a novel deep learning architecture founded on a hierarchical multi-scale and sequence modeling principle.<n>MSTN integrates a convolutional encoder that constructs a hierarchical feature pyramid for local patterns and a sequence modeling component for long-range temporal dependencies.<n>Extensive evaluations across time-series long-horizon forecasting, imputation, classification and generalizability study demonstrate that MSTN achieves competitive state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-11-25T18:09:42Z) - NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z) - A Deep Learning Approach for Spatio-Temporal Forecasting of InSAR Ground Deformation in Eastern Ireland [2.840858735842673]
Monitoring ground displacement is crucial for urban infrastructure and mitigating geological hazards.<n>This paper introduces a novel deep learning framework that transforms sparse point measurements into a dense-temporal tensor.<n>Results demonstrate that the proposed architecture provides more accurate and spatially coherent forecasts.
arXiv Detail & Related papers (2025-09-17T17:10:18Z) - OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation [91.45421429922506]
OneCAT is a unified multimodal model that seamlessly integrates understanding, generation, and editing.<n>Our framework eliminates the need for external components such as Vision Transformers (ViT) or vision tokenizer during inference.
arXiv Detail & Related papers (2025-09-03T17:29:50Z) - DMSC: Dynamic Multi-Scale Coordination Framework for Time Series Forecasting [14.176801586961286]
Time Series Forecasting (TSF) faces persistent challenges in modeling intricate temporal dependencies across different scales.<n>We propose a novel Dynamic Multi-Scale Coordination Framework (DMSC) with Multi-Scale Patch Decomposition block (EMPD), Triad Interaction Block (TIB) and Adaptive Scale Routing MoE block (ASR-MoE)<n>EMPD is designed as a built-in component to dynamically segment sequences into hierarchical patches with exponentially scaled granularities.<n>TIB then jointly models intra-patch, inter-patch, and cross-variable dependencies within each layer's decomposed representations.
arXiv Detail & Related papers (2025-08-03T13:11:52Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks [53.98633183204453]
In this paper, a novel world model-based learning framework is proposed to minimize packet-completeness-aware age of information (CAoI) in a vehicular network.<n>A world model framework is proposed to jointly learn a dynamic model of the mmWave V2X environment and use it to imagine trajectories for learning how to perform link scheduling.<n>In particular, the long-term policy is learned in differentiable imagined trajectories instead of environment interactions.
arXiv Detail & Related papers (2025-05-03T06:23:18Z) - DG-STMTL: A Novel Graph Convolutional Network for Multi-Task Spatio-Temporal Traffic Forecasting [0.0]
Key challenge to accurate prediction is how to model the complex-temporal dependencies and adapt to the inherent dynamics in data.<n>Traditional Graph Contemporal Networks (GCNs) often struggle with static adjacency matrices that introduce bias or learnable patterns.<n>This study introduces a novel MTL framework, Dynamic Group-wise S-temporal Multi-Temporal Learning (DGS-TLTM)
arXiv Detail & Related papers (2025-04-10T15:00:20Z) - UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling.<n>Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning.<n>We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z) - Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.<n>Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z) - Multi-Source Knowledge-Based Hybrid Neural Framework for Time Series Representation Learning [2.368662284133926]
The proposed hybrid architecture addresses limitations by combining both domain-specific knowledge and implicit knowledge of the relational structure underlying the MTS data.
The architecture shows promising results on multiple benchmark datasets, outperforming state-of-the-art forecasting methods.
arXiv Detail & Related papers (2024-08-22T13:58:55Z) - Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning [11.19088022423885]
We propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL.
Results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-06T08:24:06Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.