A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe
- URL: http://arxiv.org/abs/2512.23906v1
- Date: Tue, 30 Dec 2025 00:07:36 GMT
- Title: A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe
- Authors: Wendong Yao, Binhua Huang, Soumyabrata Dev,
- Abstract summary: We propose a patch-based Transformer for single-step, fixed-interval next-epoch nowcasting of displacement maps from EGMS time series.<n>The model ingests recent displacement snapshots together with (i) static kinematic indicators (mean velocity, acceleration, seasonal amplitude) computed in a leakage-safe manner from the training window only.<n>On the eastern Ireland tile (E32N34), the STGCN is strongest in the displacement-only setting, whereas the multimodal Transformer clearly outperforms CNN-LSTM, CNN-LSTM+Attn, and multimodal STGCN.
- Score: 3.3295066998131637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Near-real-time regional-scale monitoring of ground deformation is increasingly required to support urban planning, critical infrastructure management, and natural hazard mitigation. While Interferometric Synthetic Aperture Radar (InSAR) and continental-scale services such as the European Ground Motion Service (EGMS) provide dense observations of past motion, predicting the next observation remains challenging due to the superposition of long-term trends, seasonal cycles, and occasional abrupt discontinuities (e.g., co-seismic steps), together with strong spatial heterogeneity. In this study we propose a multimodal patch-based Transformer for single-step, fixed-interval next-epoch nowcasting of displacement maps from EGMS time series (resampled to a 64x64 grid over 100 km x 100 km tiles). The model ingests recent displacement snapshots together with (i) static kinematic indicators (mean velocity, acceleration, seasonal amplitude) computed in a leakage-safe manner from the training window only, and (ii) harmonic day-of-year encodings. On the eastern Ireland tile (E32N34), the STGCN is strongest in the displacement-only setting, whereas the multimodal Transformer clearly outperforms CNN-LSTM, CNN-LSTM+Attn, and multimodal STGCN when all models receive the same multimodal inputs, achieving RMSE = 0.90 mm and $R^2$ = 0.97 on the test set with the best threshold accuracies.
Related papers
- Contextual and Seasonal LSTMs for Time Series Anomaly Detection [49.50689313712684]
We propose a novel prediction-based framework named Contextual and Seasonal LSTMs (CS-LSTMs)<n>CS-LSTMs are built upon a noise decomposition strategy and jointly leverage contextual dependencies and seasonal patterns.<n>They consistently outperform state-of-the-art methods, highlighting their effectiveness and practical value in robust time series anomaly detection.
arXiv Detail & Related papers (2026-02-10T11:46:15Z) - Spatio-Temporal Transformers for Long-Term NDVI Forecasting [0.5097809301149342]
STT-LTF processes multi-scale spatial patches alongside temporal sequences (up to 20 years) through a unified transformer architecture.<n>The framework employs comprehensive self-supervised learning with spatial masking, temporal masking, and horizon sampling strategies.<n>It directly predicts arbitrary future time points without error accumulation, incorporating spatial patch embeddings, cyclical temporal encoding, and geographic coordinates.
arXiv Detail & Related papers (2026-02-02T08:29:45Z) - Breaking the Regional Barrier: Inductive Semantic Topology Learning for Worldwide Air Quality Forecasting [99.4484686548807]
We propose OmniAir, a semantic topology learning framework tailored for global station-level prediction.<n>Our approach effectively captures long-range non-Euclidean correlations and physical diffusion patterns across unevenly distributed global networks.<n>Experiments show that OmniAir achieves state-of-the-art performance against 18 baselines, maintaining high efficiency and scalability with speeds nearly 10 times faster than existing models.
arXiv Detail & Related papers (2026-01-29T15:58:07Z) - Scalable Transit Delay Prediction at City Scale: A Systematic Approach with Multi-Resolution Feature Engineering and Deep Learning [1.065661841579261]
Most existing delay prediction systems handle only a few routes, depend on hand-crafted features, and offer little guidance on how to design a reusable architecture.<n>We present a city-scale prediction pipeline that combines multi-resolution feature engineering, dimensionality reduction, and deep learning.<n>A global LSTM with cluster-aware features achieves the best trade-off between accuracy and efficiency, outperforming transformer models by 18 52% to 52%.
arXiv Detail & Related papers (2026-01-26T14:30:50Z) - MSTN: Fast and Efficient Multivariate Time Series Model [0.0]
We introduce the Multi-scale Temporal Network (MSTN), a novel deep learning architecture founded on a hierarchical multi-scale and sequence modeling principle.<n>MSTN integrates a convolutional encoder that constructs a hierarchical feature pyramid for local patterns and a sequence modeling component for long-range temporal dependencies.<n>Extensive evaluations across time-series long-horizon forecasting, imputation, classification and generalizability study demonstrate that MSTN achieves competitive state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-11-25T18:09:42Z) - Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning [70.56067503630486]
We argue that sixth-generation (6G) intelligence is not fluent token prediction but calibrated the capacity to imagine and choose.<n>We show that WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference.
arXiv Detail & Related papers (2025-11-04T17:22:22Z) - FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction [0.6421270655703623]
We propose a hybrid model, FTT-GRU, which combines a Fast Temporal Transformer (FTT) with a gated recurrent unit (GRU) layer for sequential modeling.<n>On NASA CMAPSS FD001, FTT-GRU attains RMSE 30.76, MAE 18.97, and $R2=0.45$, with 1.12 ms CPU latency at batch=1.<n>These results demonstrate that a compact Transformer-RNN hybrid delivers accurate and efficient RUL predictions on CMAPSS.
arXiv Detail & Related papers (2025-11-01T14:02:03Z) - Enhancing Spatiotemporal Networks with xLSTM: A Scalar LSTM Approach for Cellular Traffic Forecasting [0.7111641404908191]
We introduce a lightweight, dual-path Spatiotemporal Network that leverages a gradientr LSTM for efficient modeling and a three-layer Conv3D module for spatial feature extraction.<n>We show superior forecast performance over ConvLSTM baselines and strong generalization to unseen regions, making it well-suited for large-scale next-generation network deployments.
arXiv Detail & Related papers (2025-07-17T22:48:46Z) - PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [73.80718037070773]
We present the multi-modal Pedestrian-Focused Scene dataset, rigorously annotated in semi-structured scenes with the format of nuScenes.<n>We also propose a novel Hybrid Multi-Scale Fusion Network (HMFN) to detect pedestrians in densely populated and occluded scenarios.
arXiv Detail & Related papers (2025-02-21T09:57:53Z) - S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting [31.19126944008011]
Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns.<n>We propose State Space Transformer with cross-attention (S2TX) to address these concerns.<n>S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.
arXiv Detail & Related papers (2025-02-17T01:40:45Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic
Data Imputation [4.9831085918734805]
Missing data imputation has been a long-standing research topic and critical application for real-world intelligent transportation systems.
We propose a low-rank autoregressive tensor completion (LATC) framework by introducing textittemporal variation as a new regularization term.
We conduct extensive numerical experiments on several real-world traffic data sets, and our results demonstrate the effectiveness of LATC in diverse missing scenarios.
arXiv Detail & Related papers (2021-04-30T12:00:57Z) - A Generative Learning Approach for Spatio-temporal Modeling in Connected
Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles.
LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure.
In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.