Related papers: A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe

A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe

URL: http://arxiv.org/abs/2512.23906v1
Date: Tue, 30 Dec 2025 00:07:36 GMT
Title: A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe
Authors: Wendong Yao, Binhua Huang, Soumyabrata Dev,
Abstract summary: We propose a patch-based Transformer for single-step, fixed-interval next-epoch nowcasting of displacement maps from EGMS time series.<n>The model ingests recent displacement snapshots together with (i) static kinematic indicators (mean velocity, acceleration, seasonal amplitude) computed in a leakage-safe manner from the training window only.<n>On the eastern Ireland tile (E32N34), the STGCN is strongest in the displacement-only setting, whereas the multimodal Transformer clearly outperforms CNN-LSTM, CNN-LSTM+Attn, and multimodal STGCN.
Score: 3.3295066998131637
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Near-real-time regional-scale monitoring of ground deformation is increasingly required to support urban planning, critical infrastructure management, and natural hazard mitigation. While Interferometric Synthetic Aperture Radar (InSAR) and continental-scale services such as the European Ground Motion Service (EGMS) provide dense observations of past motion, predicting the next observation remains challenging due to the superposition of long-term trends, seasonal cycles, and occasional abrupt discontinuities (e.g., co-seismic steps), together with strong spatial heterogeneity. In this study we propose a multimodal patch-based Transformer for single-step, fixed-interval next-epoch nowcasting of displacement maps from EGMS time series (resampled to a 64x64 grid over 100 km x 100 km tiles). The model ingests recent displacement snapshots together with (i) static kinematic indicators (mean velocity, acceleration, seasonal amplitude) computed in a leakage-safe manner from the training window only, and (ii) harmonic day-of-year encodings. On the eastern Ireland tile (E32N34), the STGCN is strongest in the displacement-only setting, whereas the multimodal Transformer clearly outperforms CNN-LSTM, CNN-LSTM+Attn, and multimodal STGCN when all models receive the same multimodal inputs, achieving RMSE = 0.90 mm and $R^2$ = 0.97 on the test set with the best threshold accuracies.

Related papers

Contextual and Seasonal LSTMs for Time Series Anomaly Detection [49.50689313712684]
We propose a novel prediction-based framework named Contextual and Seasonal LSTMs (CS-LSTMs)<n>CS-LSTMs are built upon a noise decomposition strategy and jointly leverage contextual dependencies and seasonal patterns.<n>They consistently outperform state-of-the-art methods, highlighting their effectiveness and practical value in robust time series anomaly detection.
arXiv Detail & Related papers (2026-02-10T11:46:15Z)
Spatio-Temporal Transformers for Long-Term NDVI Forecasting [0.5097809301149342]
STT-LTF processes multi-scale spatial patches alongside temporal sequences (up to 20 years) through a unified transformer architecture.<n>The framework employs comprehensive self-supervised learning with spatial masking, temporal masking, and horizon sampling strategies.<n>It directly predicts arbitrary future time points without error accumulation, incorporating spatial patch embeddings, cyclical temporal encoding, and geographic coordinates.
arXiv Detail & Related papers (2026-02-02T08:29:45Z)
Breaking the Regional Barrier: Inductive Semantic Topology Learning for Worldwide Air Quality Forecasting [99.4484686548807]
We propose OmniAir, a semantic topology learning framework tailored for global station-level prediction.<n>Our approach effectively captures long-range non-Euclidean correlations and physical diffusion patterns across unevenly distributed global networks.<n>Experiments show that OmniAir achieves state-of-the-art performance against 18 baselines, maintaining high efficiency and scalability with speeds nearly 10 times faster than existing models.
arXiv Detail & Related papers (2026-01-29T15:58:07Z)
Scalable Transit Delay Prediction at City Scale: A Systematic Approach with Multi-Resolution Feature Engineering and Deep Learning [1.065661841579261]
Most existing delay prediction systems handle only a few routes, depend on hand-crafted features, and offer little guidance on how to design a reusable architecture.<n>We present a city-scale prediction pipeline that combines multi-resolution feature engineering, dimensionality reduction, and deep learning.<n>A global LSTM with cluster-aware features achieves the best trade-off between accuracy and efficiency, outperforming transformer models by 18 52% to 52%.
arXiv Detail & Related papers (2026-01-26T14:30:50Z)
MSTN: Fast and Efficient Multivariate Time Series Model [0.0]
We introduce the Multi-scale Temporal Network (MSTN), a novel deep learning architecture founded on a hierarchical multi-scale and sequence modeling principle.<n>MSTN integrates a convolutional encoder that constructs a hierarchical feature pyramid for local patterns and a sequence modeling component for long-range temporal dependencies.<n>Extensive evaluations across time-series long-horizon forecasting, imputation, classification and generalizability study demonstrate that MSTN achieves competitive state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-11-25T18:09:42Z)
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning [70.56067503630486]
We argue that sixth-generation (6G) intelligence is not fluent token prediction but calibrated the capacity to imagine and choose.<n>We show that WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference.
arXiv Detail & Related papers (2025-11-04T17:22:22Z)
FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction [0.6421270655703623]
We propose a hybrid model, FTT-GRU, which combines a Fast Temporal Transformer (FTT) with a gated recurrent unit (GRU) layer for sequential modeling.<n>On NASA CMAPSS FD001, FTT-GRU attains RMSE 30.76, MAE 18.97, and $R2=0.45$, with 1.12 ms CPU latency at batch=1.<n>These results demonstrate that a compact Transformer-RNN hybrid delivers accurate and efficient RUL predictions on CMAPSS.
arXiv Detail & Related papers (2025-11-01T14:02:03Z)
Enhancing Spatiotemporal Networks with xLSTM: A Scalar LSTM Approach for Cellular Traffic Forecasting [0.7111641404908191]
We introduce a lightweight, dual-path Spatiotemporal Network that leverages a gradientr LSTM for efficient modeling and a three-layer Conv3D module for spatial feature extraction.<n>We show superior forecast performance over ConvLSTM baselines and strong generalization to unseen regions, making it well-suited for large-scale next-generation network deployments.
arXiv Detail & Related papers (2025-07-17T22:48:46Z)
PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [73.80718037070773]
We present the multi-modal Pedestrian-Focused Scene dataset, rigorously annotated in semi-structured scenes with the format of nuScenes.<n>We also propose a novel Hybrid Multi-Scale Fusion Network (HMFN) to detect pedestrians in densely populated and occluded scenarios.
arXiv Detail & Related papers (2025-02-21T09:57:53Z)
S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting [31.19126944008011]
Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns.<n>We propose State Space Transformer with cross-attention (S2TX) to address these concerns.<n>S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.
arXiv Detail & Related papers (2025-02-17T01:40:45Z)
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT) Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z)
Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic Data Imputation [4.9831085918734805]
Missing data imputation has been a long-standing research topic and critical application for real-world intelligent transportation systems. We propose a low-rank autoregressive tensor completion (LATC) framework by introducing textittemporal variation as a new regularization term. We conduct extensive numerical experiments on several real-world traffic data sets, and our results demonstrate the effectiveness of LATC in diverse missing scenarios.
arXiv Detail & Related papers (2021-04-30T12:00:57Z)
A Generative Learning Approach for Spatio-temporal Modeling in Connected Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles. LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure. In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.