Learning From Simulators: A Theory of Simulation-Grounded Learning
- URL: http://arxiv.org/abs/2509.18990v2
- Date: Wed, 01 Oct 2025 20:40:45 GMT
- Title: Learning From Simulators: A Theory of Simulation-Grounded Learning
- Authors: Carson Dudley, Marisa Eisenberg,
- Abstract summary: Simulation-Grounded Neural Networks (SGNNs) are predictive models trained entirely on synthetic data from mechanistic simulations.<n>We place SGNNs in a unified statistical framework. Under standard loss functions, they can be interpreted as amortized Bayesian predictors trained under a simulator-induced prior.<n>We provide numerical experiments to validate theoretical predictions. SGNNs recover latent parameters, remain robust under mismatch, and outperform classical tools.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Simulation-Grounded Neural Networks (SGNNs) are predictive models trained entirely on synthetic data from mechanistic simulations. They have achieved state-of-the-art performance in domains where real-world labels are limited or unobserved, but lack a formal underpinning. We place SGNNs in a unified statistical framework. Under standard loss functions, they can be interpreted as amortized Bayesian predictors trained under a simulator-induced prior. Empirical risk minimization then yields convergence to the Bayes-optimal predictor under the synthetic distribution. We employ classical results on distribution shift to characterize how performance degrades when the simulator diverges from reality. Beyond these consequences, we develop SGNN-specific results: (i) conditions under which unobserved scientific parameters are learnable via simulation, and (ii) a back-to-simulation attribution method that provides mechanistic explanations of predictions by linking them to the simulations the model deems similar, with guarantees of posterior consistency. We provide numerical experiments to validate theoretical predictions. SGNNs recover latent parameters, remain robust under mismatch, and outperform classical tools: in a model selection task, SGNNs achieve half the error of AIC in distinguishing mechanistic dynamics. These results establish SGNNs as a principled and practical framework for scientific prediction in data-limited regimes.
Related papers
- BLIPs: Bayesian Learned Interatomic Potentials [47.73617239750485]
Machine Learning Interatomic Potentials (MLIPs) are becoming a central tool in simulation-based chemistry.<n>MLIPs do not provide uncertainty estimates by construction, which are fundamental to guide active learning pipelines.<n>BLIP is a scalable, architecture-agnostic variational Bayesian framework for training or fine-tuning MLIPs.
arXiv Detail & Related papers (2025-08-19T17:28:14Z) - Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery [0.0]
We introduce Simulation-Grounded Neural Networks (SGNNs), a framework that uses mechanistic simulations as training data for neural networks.<n>SGNNs achieve state-of-the-art results across scientific disciplines and modeling tasks.<n>They enable back-to-simulation attribution, a new form of mechanistic interpretability.
arXiv Detail & Related papers (2025-07-11T19:18:42Z) - Harnessing Equivariance: Modeling Turbulence with Graph Neural Networks [0.0]
This work proposes a novel methodology for turbulence modeling in Large Eddy Simulation (LES) based on Graph Neural Networks (GNNs)<n>GNNs embed the discrete rotational, reflectional and translational symmetries of the Navier-Stokes equations into the model architecture.<n>The suitability of the proposed approach is investigated for two canonical test cases: Homogeneous Isotropic Turbulence (HIT) and turbulent channel flow.
arXiv Detail & Related papers (2025-04-10T13:37:54Z) - Evidential Uncertainty Probes for Graph Neural Networks [3.5169632430086315]
We propose a plug-and-play framework for uncertainty quantification in Graph Neural Networks (GNNs)<n>Our Evidential Probing Network (EPN) uses a lightweight Multi-Layer-Perceptron (MLP) head to extract evidence from learned representations.<n>EPN-reg achieves state-of-the-art performance in accurate and efficient uncertainty quantification, making it suitable for real-world deployment.
arXiv Detail & Related papers (2025-03-11T07:00:54Z) - GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.<n>We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.<n>In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding [62.075029712357]
This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM)
CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models.
We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and surface wind datasets.
arXiv Detail & Related papers (2024-05-03T15:54:50Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Robust Neural Posterior Estimation and Statistical Model Criticism [1.5749416770494706]
We argue that modellers must treat simulators as idealistic representations of the true data generating process.
In this work we revisit neural posterior estimation (NPE), a class of algorithms that enable black-box parameter inference in simulation models.
We find that the presence of misspecification, in contrast, leads to unreliable inference when NPE is used naively.
arXiv Detail & Related papers (2022-10-12T20:06:55Z) - Learning Stochastic Dynamics with Statistics-Informed Neural Network [0.4297070083645049]
We introduce a machine-learning framework named statistics-informed neural network (SINN) for learning dynamics from data.
We devise mechanisms for training the neural network model to reproduce the correct emphstatistical behavior of a target process.
We show that the obtained reduced-order model can be trained on temporally coarse-grained data and hence is well suited for rare-event simulations.
arXiv Detail & Related papers (2022-02-24T18:21:01Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Likelihood-Free Inference in State-Space Models with Unknown Dynamics [71.94716503075645]
We introduce a method for inferring and predicting latent states in state-space models where observations can only be simulated, and transition dynamics are unknown.
We propose a way of doing likelihood-free inference (LFI) of states and state prediction with a limited number of simulations.
arXiv Detail & Related papers (2021-11-02T12:33:42Z) - Hessian-based toolbox for reliable and interpretable machine learning in
physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture.
It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions.
Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z) - A Doubly Stochastic Simulator with Applications in Arrivals Modeling and
Simulation [8.808993671472349]
We propose a framework that integrates classical Monte Carlo simulators and Wasserstein generative adversarial networks to model, estimate, and simulate a broad class of arrival processes.
Classical Monte Carlo simulators have advantages at capturing interpretable "physics" of a Poisson object, whereas neural-network-based simulators have advantages at capturing less-interpretable complicated dependence within a high-dimensional distribution.
arXiv Detail & Related papers (2020-12-27T13:32:16Z) - Theory-guided hard constraint projection (HCP): a knowledge-based
data-driven scientific machine learning method [7.778724782015986]
This study proposes theory-guided hard constraint projection (HCP)
This model converts physical constraints, such as governing equations, into a form that is easy to handle through discretization.
The performance of the theory-guided HCP is verified by experiments based on the heterogeneous subsurface flow problem.
arXiv Detail & Related papers (2020-12-11T06:17:43Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z) - Stochastic Graph Neural Networks [123.39024384275054]
Graph neural networks (GNNs) model nonlinear representations in graph data with applications in distributed agent coordination, control, and planning.
Current GNN architectures assume ideal scenarios and ignore link fluctuations that occur due to environment, human factors, or external attacks.
In these situations, the GNN fails to address its distributed task if the topological randomness is not considered accordingly.
arXiv Detail & Related papers (2020-06-04T08:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.