Towards Trustworthy Amortized Bayesian Model Comparison
- URL: http://arxiv.org/abs/2508.20614v1
- Date: Thu, 28 Aug 2025 10:01:01 GMT
- Title: Towards Trustworthy Amortized Bayesian Model Comparison
- Authors: Šimon Kucharský, Aayush Mishra, Daniel Habermann, Stefan T. Radev, Paul-Christian Bürkner,
- Abstract summary: We supplement simulation-based training with a self-consistency loss on unlabeled real data to improve BMC estimates.<n>We compare amortized evidence estimates with and without SC against analytic or bridge sampling benchmarks.<n>It offers limited gains with neural surrogate likelihoods, making it most practical for trustworthy BMC when likelihoods are exact.
- Score: 8.705960143968882
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Amortized Bayesian model comparison (BMC) enables fast probabilistic ranking of models via simulation-based training of neural surrogates. However, the reliability of neural surrogates deteriorates when simulation models are misspecified - the very case where model comparison is most needed. Thus, we supplement simulation-based training with a self-consistency (SC) loss on unlabeled real data to improve BMC estimates under empirical distribution shifts. Using a numerical experiment and two case studies with real data, we compare amortized evidence estimates with and without SC against analytic or bridge sampling benchmarks. SC improves calibration under model misspecification when having access to analytic likelihoods. However, it offers limited gains with neural surrogate likelihoods, making it most practical for trustworthy BMC when likelihoods are exact.
Related papers
- Improving the Accuracy of Amortized Model Comparison with Self-Consistency [8.705960143968882]
Amortized Bayesian inference (ABI) offers fast, scalable approximations to posterior densities by training neural surrogates on data simulated from the statistical model.<n>When observed data fall outside the training distribution, neural surrogates can behave unpredictably.<n>Recent work on self-consistency (SC) provides a promising remedy to this issue, accessible even for empirical data.
arXiv Detail & Related papers (2025-12-16T11:25:40Z) - metabeta - A fast neural model for Bayesian mixed-effects regression [22.95834831696185]
We propose metabeta, a transformer-based neural network model for mixed-effects regression.<n>We show that it reaches stable and comparable performance to MCMC-based parameter estimation at a fraction of the usually required time.
arXiv Detail & Related papers (2025-10-08T19:20:00Z) - Uncertainty-Aware Surrogate-based Amortized Bayesian Inference for Computationally Expensive Models [1.5511264120614792]
We propose Uncertainty-Aware Surrogate-based Amortized Bayesian Inference (UA-SABI)<n>Our experiments show that this approach enables reliable, fast, and repeated Bayesian inference for computationally expensive models, even under tight time constraints.
arXiv Detail & Related papers (2025-05-13T15:44:10Z) - Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.<n>We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.<n>Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [71.04084063541777]
Counterfactual learning to rank has attracted extensive attention in the IR community.<n>Models can be theoretically unbiased when the user behavior assumption is correct and the propensity estimation is accurate.<n>Their effectiveness is usually empirically evaluated via simulation-based experiments due to a lack of widely available, large-scale, real click logs.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control [45.84205238554709]
We present a method for reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC.
We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC.
arXiv Detail & Related papers (2023-08-03T10:21:53Z) - Maximum Likelihood Learning of Unnormalized Models for Simulation-Based
Inference [44.281860162298564]
We introduce two synthetic likelihood methods for Simulation-Based Inference.
We learn a conditional energy-based model (EBM) of the likelihood using synthetic data generated by the simulator.
We demonstrate the properties of both methods on a range of synthetic datasets, and apply them to a model of the neuroscience network in the crab.
arXiv Detail & Related papers (2022-10-26T14:38:24Z) - Robust Neural Posterior Estimation and Statistical Model Criticism [1.5749416770494706]
We argue that modellers must treat simulators as idealistic representations of the true data generating process.
In this work we revisit neural posterior estimation (NPE), a class of algorithms that enable black-box parameter inference in simulation models.
We find that the presence of misspecification, in contrast, leads to unreliable inference when NPE is used naively.
arXiv Detail & Related papers (2022-10-12T20:06:55Z) - BSM loss: A superior way in modeling aleatory uncertainty of
fine_grained classification [0.0]
We propose a modified Bootstrapping loss(BS loss) function with Mixup data augmentation strategy.
Our experiments indicated that BS loss with Mixup(BSM) model can halve the Expected Error(ECE) compared to standard data augmentation.
BSM model is able to perceive the semantic distance of out-of-domain data, demonstrating high potential in real-world clinical practice.
arXiv Detail & Related papers (2022-06-09T13:06:51Z) - Model Comparison in Approximate Bayesian Computation [0.456877715768796]
A common problem in natural sciences is the comparison of competing models in the light of observed data.
This framework relies on the calculation of likelihood functions which are intractable for most models used in practice.
I propose a new efficient method to perform Bayesian model comparison in ABC.
arXiv Detail & Related papers (2022-03-15T10:24:16Z) - CGAN-EB: A Non-parametric Empirical Bayes Method for Crash Hotspot
Identification Using Conditional Generative Adversarial Networks: A Simulated
Crash Data Study [2.3204178451683264]
A new non-parametric empirical Bayes approach called CGAN-EB is proposed for approximating empirical Bayes (EB) estimates in traffic locations.
Its performance is compared in a simulation study with the traditional approach based on negative binomial model (NB-EB)
arXiv Detail & Related papers (2021-12-13T16:02:47Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.