Robust Simulation-Based Inference under Missing Data via Neural Processes
- URL: http://arxiv.org/abs/2503.01287v1
- Date: Mon, 03 Mar 2025 08:22:01 GMT
- Title: Robust Simulation-Based Inference under Missing Data via Neural Processes
- Authors: Yogesh Verma, Ayush Bharti, Vikas Garg,
- Abstract summary: We formalize the problem of missing data in SBI and demonstrate that naive imputation methods can introduce bias in the estimation of SBI posterior.<n>We also introduce a novel amortized method that addresses this issue by jointly learning the imputation model and the inference network within a neural posterior estimation framework.
- Score: 6.32765579505162
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Simulation-based inference (SBI) methods typically require fully observed data to infer parameters of models with intractable likelihood functions. However, datasets often contain missing values due to incomplete observations, data corruptions (common in astrophysics), or instrument limitations (e.g., in high-energy physics applications). In such scenarios, missing data must be imputed before applying any SBI method. We formalize the problem of missing data in SBI and demonstrate that naive imputation methods can introduce bias in the estimation of SBI posterior. We also introduce a novel amortized method that addresses this issue by jointly learning the imputation model and the inference network within a neural posterior estimation (NPE) framework. Extensive empirical results on SBI benchmarks show that our approach provides robust inference outcomes compared to standard baselines for varying levels of missing data. Moreover, we demonstrate the merits of our imputation model on two real-world bioactivity datasets (Adrenergic and Kinase assays). Code is available at https://github.com/Aalto-QuML/RISE.
Related papers
- Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models [5.952993835541411]
We show how TabPFN can be used as pre-trained autoregressive conditional density estimators for simulation-based inference.
NPE-PF eliminates the need for inference network selection, training, and hyper parameter tuning.
It exhibits superior robustness to model misspecification and can be scaled to simulation budgets that exceed the context size limit of TabPFN.
arXiv Detail & Related papers (2025-04-24T15:29:39Z) - Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data [2.9434969286228494]
We propose a semi-supervised approach that enables training not only on (labeled) simulated data generated from the model, but also on unlabeled data originating from any source, including real-world data.<n>The results of our initial experiments show remarkable improvements in the robustness of ABI on out-of-simulation data.<n>If our findings also generalize to other scenarios and model classes, we believe that our new method represents a major breakthrough in neural ABI.
arXiv Detail & Related papers (2025-01-23T08:57:02Z) - A Comprehensive Guide to Simulation-based Inference in Computational Biology [5.333122501732079]
This paper provides comprehensive guidelines for deciding between SBI approaches for complex biological models.
We apply the guidelines to two agent-based models that describe cellular dynamics using real-world data.
Our study unveils a critical insight: while neural SBI methods demand significantly fewer simulations for inference results, they tend to yield biased estimations.
arXiv Detail & Related papers (2024-09-29T12:04:03Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Simulation-based Inference for Cardiovascular Models [43.55219268578912]
We use simulation-based inference to solve the inverse problem of mapping waveforms back to plausible physiological parameters.<n>We perform an in-silico uncertainty analysis of five biomarkers of clinical interest.<n>We study the gap between in-vivo and in-silico with the MIMIC-III waveform database.
arXiv Detail & Related papers (2023-07-26T02:34:57Z) - Learning Robust Statistics for Simulation-based Inference under Model
Misspecification [23.331522354991527]
We propose the first general approach to handle model misspecification that works across different classes of simulation-based inference methods.
We show that our method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
arXiv Detail & Related papers (2023-05-25T09:06:26Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - Efficient identification of informative features in simulation-based
inference [5.538076164981993]
We show that one can marginalize the trained surrogate likelihood post-hoc before inferring the posterior to assess the contribution of a feature.
We demonstrate the usefulness of our method by identifying the most important features for inferring parameters of an example HH neuron model.
arXiv Detail & Related papers (2022-10-21T12:35:46Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.