Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting
- URL: http://arxiv.org/abs/2512.17696v1
- Date: Fri, 19 Dec 2025 15:32:24 GMT
- Title: Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting
- Authors: Yuri Calleo,
- Abstract summary: We propose a hybrid architecture that injects a geostatistic inductive bias directly into the decomposing self-attention mechanism via a learnable costatistics kernel.<n>We demonstrate the phenomenon of Deep Variography'', where the network successfully recovers the true spatial parameters of the underlying process end-to-end via backpropagation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.
Related papers
- Leveraging generative adversarial networks with spatially adaptive denormalization for multivariate stochastic seismic data inversion [0.0]
We propose an iterative geostatistical inversion algorithm, SPADE-GANInv, for the prediction of facies and multiple correlated continuous properties from seismic data.<n>The SPADE-GAN is trained to reproduce realistic geometries, while sequential co-simulation predicts the spatial variability of the facies-dependent continuous properties.<n>The method is demonstrated on both 2-D synthetic scenarios and field data, targeting the prediction of facies, porosity, and acoustic impedance from full-stack seismic data.
arXiv Detail & Related papers (2025-12-02T15:25:22Z) - Deep operator network for surrogate modeling of poroelasticity with random permeability fields [3.7214007898390196]
Poroelasticity -- coupled fluid flow and elastic deformation in porous media -- often involves spatially variable permeability.<n>In this study, we propose a surrogate modeling framework based on the deep operator network (DeepONet), a neural architecture designed to learn mappings between infinite-dimensional function spaces.<n>To enhance predictive accuracy and stability, we integrate three strategies: nondimensionalization of the governing equations, input dimensionality reduction via Karhunen--Lo'eve expansion, and a two-step training procedure that decouples the optimization of branch and trunk networks.
arXiv Detail & Related papers (2025-09-15T14:18:49Z) - Multidimensional Distributional Neural Network Output Demonstrated in Super-Resolution of Surface Wind Speed [0.0]
We present a framework for training neural networks with a multidimensional Gaussian loss.<n>This framework generates closed-form predictive distributions over outputs with non-identically distributed and heteroscedastic structure.<n>We discuss its broader applicability to uncertainty-aware prediction in scientific models.
arXiv Detail & Related papers (2025-08-21T18:22:44Z) - Certified Neural Approximations of Nonlinear Dynamics [51.01318247729693]
In safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system.<n>We propose a novel, adaptive, and parallelizable verification method based on certified first-order models.
arXiv Detail & Related papers (2025-05-21T13:22:20Z) - Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z) - A Generalized Unified Skew-Normal Process with Neural Bayes Inference [1.5388334141379898]
In recent decades, statisticians have been increasingly encountering spatial data that exhibit non-Gaussian behaviors such as asymmetry and heavy-tailedness.<n>To address the limitations of the Gaussian models, a variety of skewed models has been proposed.<n>Among various proposals in the literature, unified skewed distributions, such as the Unified Skew-Normal (SUN), have received considerable attention.
arXiv Detail & Related papers (2024-11-26T13:00:39Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - Physics-informed UNets for Discovering Hidden Elasticity in
Heterogeneous Materials [0.0]
We develop a novel UNet-based neural network model for inversion in elasticity (El-UNet)
We show superior performance, both in terms of accuracy and computational cost, by El-UNet compared to fully-connected physics-informed neural networks.
arXiv Detail & Related papers (2023-06-01T23:35:03Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.