Related papers: Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction

Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction

URL: http://arxiv.org/abs/2603.00214v1
Date: Fri, 27 Feb 2026 15:42:05 GMT
Title: Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction
Authors: Knut-Andreas Lie, Olav Møyner, Elling Svee, Jakob Torben,
Abstract summary: This paper investigates agentic scientific simulation, where model construction is organized as an execution-grounded interpret-act-validate loop.<n>We present JutulGPT, a reference implementation built on the fully differentiable Julia-based reservoir simulator JutulDarcy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM agents are increasingly used for code generation, but physics-based simulation poses a deeper challenge: natural-language descriptions of simulation models are inherently underspecified, and different admissible resolutions of implicit choices produce physically valid but scientifically distinct configurations. Without explicit detection and resolution of these ambiguities, neither the correctness of the result nor its reproducibility from the original description can be assured. This paper investigates agentic scientific simulation, where model construction is organized as an execution-grounded interpret-act-validate loop and the simulator serves as the authoritative arbiter of physical validity rather than merely a runtime. We present JutulGPT, a reference implementation built on the fully differentiable Julia-based reservoir simulator JutulDarcy. The agent combines structured retrieval of documentation and examples with code synthesis, static analysis, execution, and systematic interpretation of solver diagnostics. Underspecified modelling choices are detected explicitly and resolved either autonomously (with logged assumptions) or through targeted user queries. The results demonstrate that agent-mediated model construction can be grounded in simulator validation, while also revealing a structural limitation: choices resolved tacitly through simulator defaults are invisible to the assumption log and to any downstream representation. A secondary experiment with autonomous reconstruction of a reference model from progressively abstract textual descriptions shows that reconstruction variability exposes latent degrees of freedom in simulation descriptions and provides a practical methodology for auditing reproducibility. All code, prompts, and agent logs are publicly available.

Related papers

Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation [8.067859101380389]
Non-executable or physically inconsistent outputs remain prevalent under stringent engineering constraints.<n>A framework for physics-consistent automatic building modeling is proposed.<n>CivilInstruct is introduced as a domain-specific dataset that formalizes structural engineering knowledge and constraint reasoning.<n> MBEval is presented as a verification-driven benchmark that evaluates executability and structural dynamics consistency.
arXiv Detail & Related papers (2026-02-06T06:57:04Z)
Multi-Agent Procedural Graph Extraction with Structural and Logical Refinement [66.51979814832332]
model formulates procedural graph extraction as a multi-round reasoning process with dedicated structural and logical refinement.<n>Experiments demonstrate that model achieves substantial improvements in both structural correctness and logical consistency over strong baselines.
arXiv Detail & Related papers (2026-01-27T04:00:48Z)
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors [58.87134689752605]
We introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducible science of LLM simulation.<n>We show that even the best LLMs today have limited simulation ability (score: 40.80/100), performance scales log-linearly with model size.<n>We demonstrate that simulation ability correlates most strongly with deep, knowledge-intensive reasoning.
arXiv Detail & Related papers (2025-10-20T13:14:38Z)
G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration [48.948187359727996]
G-Sim is a hybrid framework that automates simulator construction with rigorous empirical calibration.<n>It produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions.
arXiv Detail & Related papers (2025-06-10T22:14:34Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.<n>We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.<n>In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z)
Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space. Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z)
All-in-one simulation-based inference [19.41881319338419]
We present a new amortized inference method -- the Simformer -- which overcomes current limitations. The Simformer outperforms current state-of-the-art amortized inference approaches on benchmark tasks. It can be applied to models with function-valued parameters, it can handle inference scenarios with missing or unstructured data, and it can sample arbitrary conditionals of the joint distribution of parameters and data.
arXiv Detail & Related papers (2024-04-15T10:12:33Z)
Informal Safety Guarantees for Simulated Optimizers Through Extrapolation from Partial Simulations [0.0]
Self-supervised learning is the backbone of state of the art language modeling. It has been argued that training with predictive loss on a self-supervised dataset causes simulators.
arXiv Detail & Related papers (2023-11-29T09:32:56Z)
DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps [46.58231605323107]
We propose DeforestVis, a visual analytics tool that offers summarization of the behaviour of complex ML models. DeforestVis helps users to explore the complexity versus fidelity trade-off by incrementally generating more stumps. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.
arXiv Detail & Related papers (2023-03-31T21:17:15Z)
DISCO: Double Likelihood-free Inference Stochastic Control [29.84276469617019]
We propose to leverage the power of modern simulators and recent techniques in Bayesian statistics for likelihood-free inference. The posterior distribution over simulation parameters is propagated through a potentially non-analytical model of the system. Experiments show that the controller proposed attained superior performance and robustness on classical control and robotics tasks.
arXiv Detail & Related papers (2020-02-18T05:29:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.