Related papers: Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

URL: http://arxiv.org/abs/2405.08719v2
Date: Fri, 30 May 2025 12:44:23 GMT
Title: Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
Authors: Antoine Wehenkel, Juan L. Gamella, Ozan Sener, Jens Behrmann, Guillermo Sapiro, Jörn-Henrik Jacobsen, Marco Cuturi,
Abstract summary: This work introduces robust posterior estimation(RoPE), a framework that overcomes model misspecification with a small real-world calibration set of ground-truth parameter measurements.<n>Results on four synthetic tasks and two real-world problems with ground-truth labels demonstrate that RoPE outperforms baselines and consistently returns informative and calibrated credible intervals.
Score: 43.811367860375825
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Driven by steady progress in deep generative modeling, simulation-based inference (SBI) has emerged as the workhorse for inferring the parameters of stochastic simulators. However, recent work has demonstrated that model misspecification can compromise the reliability of SBI, preventing its adoption in important applications where only misspecified simulators are available. This work introduces robust posterior estimation~(RoPE), a framework that overcomes model misspecification with a small real-world calibration set of ground-truth parameter measurements. We formalize the misspecification gap as the solution of an optimal transport~(OT) problem between learned representations of real-world and simulated observations, allowing RoPE to learn a model of the misspecification without placing additional assumptions on its nature. RoPE demonstrates how OT and a calibration set provide a controllable balance between calibrated uncertainty and informative inference, even under severely misspecified simulators. Results on four synthetic tasks and two real-world problems with ground-truth labels demonstrate that RoPE outperforms baselines and consistently returns informative and calibrated credible intervals.

Related papers

G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration [48.948187359727996]
G-Sim is a hybrid framework that automates simulator construction with rigorous empirical calibration.<n>It produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions.
arXiv Detail & Related papers (2025-06-10T22:14:34Z)
A Machine Learning Approach to Generate Residual Stress Distributions using Sparse Characterization Data in Friction-Stir Processed Parts [0.0]
Residual stresses, which remain within a component after processing, can deteriorate performance.<n>This work proposes a machine learning (ML) based Residual Stress Generator (RSG) to infer full-field stresses from limited measurements.
arXiv Detail & Related papers (2025-06-09T20:26:57Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Testing Generalizability in Causal Inference [3.547529079746247]
There is no formal procedure for statistically evaluating generalizability in machine learning algorithms. We propose a systematic and quantitative framework for evaluating model generalizability in causal inference settings. By basing simulations on real data, our method ensures more realistic evaluations, which is often missing in current work.
arXiv Detail & Related papers (2024-11-05T11:44:00Z)
FUSE: Fast Unified Simulation and Estimation for PDEs [11.991297011923004]
We argue that solving both problems within the same framework can lead to consistent gains in accuracy and robustness. We present the capabilities of the proposed methodology for predicting continuous and discrete biomarkers in full-body haemodynamics simulations.
arXiv Detail & Related papers (2024-05-23T13:37:26Z)
All-in-one simulation-based inference [19.41881319338419]
We present a new amortized inference method -- the Simformer -- which overcomes current limitations. The Simformer outperforms current state-of-the-art amortized inference approaches on benchmark tasks. It can be applied to models with function-valued parameters, it can handle inference scenarios with missing or unstructured data, and it can sample arbitrary conditionals of the joint distribution of parameters and data.
arXiv Detail & Related papers (2024-04-15T10:12:33Z)
Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model. We introduce three robustness indicators and conduct experiments across diverse robust datasets. Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z)
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z)
Robust Neural Posterior Estimation and Statistical Model Criticism [1.5749416770494706]
We argue that modellers must treat simulators as idealistic representations of the true data generating process. In this work we revisit neural posterior estimation (NPE), a class of algorithms that enable black-box parameter inference in simulation models. We find that the presence of misspecification, in contrast, leads to unreliable inference when NPE is used naively.
arXiv Detail & Related papers (2022-10-12T20:06:55Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
BayesFlow can reliably detect Model Misspecification and Posterior Errors in Amortized Bayesian Inference [0.0]
We conceptualize the types of model misspecification arising in simulation-based inference and systematically investigate the performance of the BayesFlow framework under these misspecifications. We propose an augmented optimization objective which imposes a probabilistic structure on the latent data space and utilize maximum mean discrepancy (MMD) to detect potentially catastrophic misspecifications.
arXiv Detail & Related papers (2021-12-16T13:25:27Z)
MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio. We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z)
Robust Bayesian Inference for Discrete Outcomes with the Total Variation Distance [5.139874302398955]
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination. Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD) We empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
arXiv Detail & Related papers (2020-10-26T09:53:06Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
DISCO: Double Likelihood-free Inference Stochastic Control [29.84276469617019]
We propose to leverage the power of modern simulators and recent techniques in Bayesian statistics for likelihood-free inference. The posterior distribution over simulation parameters is propagated through a potentially non-analytical model of the system. Experiments show that the controller proposed attained superior performance and robustness on classical control and robotics tasks.
arXiv Detail & Related papers (2020-02-18T05:29:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.