Model-Free Assessment of Simulator Fidelity via Quantile Curves
- URL: http://arxiv.org/abs/2512.05024v1
- Date: Thu, 04 Dec 2025 17:39:51 GMT
- Title: Model-Free Assessment of Simulator Fidelity via Quantile Curves
- Authors: Garud Iyengar, Yu-Shiou Willy Lin, Kaizheng Wang,
- Abstract summary: Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys.<n>We propose a computationally tractable method to estimate the quantile function of the discrepancy between the simulated and ground-truth outcome distributions.
- Score: 12.483260526189449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simulation of complex systems originated in manufacturing and queuing applications. It is now widely used for large-scale, ML-based systems in research, education, and consumer surveys. However, characterizing the discrepancy between simulators and ground truth remains challenging for increasingly complex, machine-learning-based systems. We propose a computationally tractable method to estimate the quantile function of the discrepancy between the simulated and ground-truth outcome distributions. Our approach focuses on output uncertainty and treats the simulator as a black box, imposing no modeling assumptions on its internals, and hence applies broadly across many parameter families, from Bernoulli and multinomial models to continuous, vector-valued settings. The resulting quantile curve supports confidence interval construction for unseen scenarios, risk-aware summaries of sim-to-real discrepancy (e.g., VaR/CVaR), and comparison of simulators' performance. We demonstrate our methodology in an application assessing LLM simulation fidelity on the WorldValueBench dataset spanning four LLMs.
Related papers
- Quantifying and Attributing Submodel Uncertainty in Stochastic Simulation Models and Digital Twins [0.1234398109349733]
This paper investigates how submodel uncertainty affects the estimation of system performance metrics.<n>We develop a framework for quantifying submodel uncertainty in simulation models and extend the framework to digital-twin settings.
arXiv Detail & Related papers (2026-02-18T00:06:39Z) - SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors [58.87134689752605]
We introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducible science of LLM simulation.<n>We show that even the best LLMs today have limited simulation ability (score: 40.80/100), performance scales log-linearly with model size.<n>We demonstrate that simulation ability correlates most strongly with deep, knowledge-intensive reasoning.
arXiv Detail & Related papers (2025-10-20T13:14:38Z) - G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration [48.948187359727996]
G-Sim is a hybrid framework that automates simulator construction with rigorous empirical calibration.<n>It produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions.
arXiv Detail & Related papers (2025-06-10T22:14:34Z) - Transfer learning for multifidelity simulation-based inference in cosmology [0.0]
Pre-training on dark-matter-only $N$-body simulations reduces the required number of high-fidelity hydrodynamical simulations by a factor between $8$ and $15$.<n>By leveraging cheaper simulations, our approach enables performant and accurate inference on high-fidelity models while substantially reducing computational costs.
arXiv Detail & Related papers (2025-05-27T14:04:30Z) - Multifidelity Simulation-based Inference for Computationally Expensive Simulators [7.065679767112407]
We introduce MF-(TS)NPE, a multifidelity approach to neural posterior estimation that uses transfer learning to leverage inexpensive low-fidelity simulations.<n>We further improve simulation efficiency by introducing A-MF-TSNPE, a sequential variant that uses an acquisition function targeting the predictive uncertainty of the density estimator.
arXiv Detail & Related papers (2025-02-12T13:59:22Z) - Active Sequential Posterior Estimation for Sample-Efficient Simulation-Based Inference [12.019504660711231]
We introduce sequential neural posterior estimation (ASNPE)<n>ASNPE brings an active learning scheme into the inference loop to estimate the utility of simulation parameter candidates to the underlying probabilistic model.<n>Our method outperforms well-tuned benchmarks and state-of-the-art posterior estimation methods on a large-scale real-world traffic network.
arXiv Detail & Related papers (2024-12-07T08:57:26Z) - NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies.
Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other.
NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z) - Optimising Highly-Parallel Simulation-Based Verification of
Cyber-Physical Systems [0.0]
Cyber-Physical Systems (CPSs) arise in many industry-relevant domains and are often mission- or safety-critical.
System-Level Verification (SLV) of CPSs aims at certifying that given (e.g. safety or liveness) specifications are met or at estimating the value of some.
arXiv Detail & Related papers (2023-07-28T08:08:27Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation.
We start with the motivation behind the development of frameworks implementing the simulations -- simulators.
We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
arXiv Detail & Related papers (2022-06-22T19:33:21Z) - Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs.
We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling.
We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z) - A Doubly Stochastic Simulator with Applications in Arrivals Modeling and
Simulation [8.808993671472349]
We propose a framework that integrates classical Monte Carlo simulators and Wasserstein generative adversarial networks to model, estimate, and simulate a broad class of arrival processes.
Classical Monte Carlo simulators have advantages at capturing interpretable "physics" of a Poisson object, whereas neural-network-based simulators have advantages at capturing less-interpretable complicated dependence within a high-dimensional distribution.
arXiv Detail & Related papers (2020-12-27T13:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.