Related papers: Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation

Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation

URL: http://arxiv.org/abs/2601.22367v1
Date: Thu, 29 Jan 2026 22:20:47 GMT
Title: Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation
Authors: Shiyi Sun, Geoff K. Nicholls, Jeong Eun Lee,
Abstract summary: Generalized Bayesian Inference (GBI) tempers a loss with a temperature $>0$ to mitigate overconfidence and improve under model misspecification.<n>We give the first fully amortized variational approximation to the tempered posterior family $p_(mid x) propto (),p(x mid )$ by training a single $(x,)$-conditioned neural posterior estimator $q_(mid x,)$ that enables sampling in a single forward pass.
Score: 1.096028999747108
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generalized Bayesian Inference (GBI) tempers a loss with a temperature $β>0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset and each $β$ value. We give the first fully amortized variational approximation to the tempered posterior family $p_β(θ\mid x) \propto π(θ)\,p(x \mid θ)^β$ by training a single $(x,β)$-conditioned neural posterior estimator $q_φ(θ\mid x,β)$ that enables sampling in a single forward pass, without simulator calls or inference-time MCMC. We introduce two complementary training routes: (i) synthesize off-manifold samples $(θ,x) \sim π(θ)\,p(x \mid θ)^β$ and (ii) reweight a fixed base dataset $π(θ)\,p(x \mid θ)$ using self-normalized importance sampling (SNIS). We show that the SNIS-weighted objective provides a consistent forward-KL fit to the tempered posterior with finite weight variance. Across four standard simulation-based inference (SBI) benchmarks, including the chaotic Lorenz-96 system, our $β$-amortized estimator achieves competitive posterior approximations in standard two-sample metrics, matching non-amortized MCMC-based power-posterior samplers over a wide range of temperatures.

Related papers

Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z)
Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum [62.691095807959215]
We establish an optimal sample complexity of $O(-2)$ for obtaining an $$-optimal global policy using a single-timescale actor-critic (AC) algorithm.<n>These mechanisms are compatible with existing deep learning architectures and require only minor modifications, without compromising practical applicability.
arXiv Detail & Related papers (2026-02-02T00:35:42Z)
Inferring Cosmological Parameters with Evidential Physics-Informed Neural Networks [0.0]
We use a novel variant of Physics-Informed Neural Networks to predict cosmological parameters from recent supernovae and baryon acoustic oscillations (BAO) datasets.<n>Our machine learning framework generates uncertainty estimates for target variables and the inferred unknown parameters of the underlying PDE descriptions.
arXiv Detail & Related papers (2025-09-29T06:25:53Z)
Precise Asymptotics of Bagging Regularized M-estimators [20.077783679095443]
We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators.<n>Key to our analysis is a new result on the joint behavior of correlations between the estimator and residual errors on overlapping subsamples.<n>Joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data.
arXiv Detail & Related papers (2024-09-23T17:48:28Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning [78.30395044401321]
We develop a novel model-based approach to reinforcement learning (MBRL) It relaxes the assumptions on the target transition model to belong to a generic family of mixture models. It can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.
arXiv Detail & Related papers (2022-06-02T17:27:49Z)
Variational Inference for Bayesian Bridge Regression [0.0]
We study the implementation of Automatic Differentiation Variational inference (ADVI) for Bayesian inference on regression models with bridge penalization. The bridge approach uses $ell_alpha$ norm, with $alpha in (0, +infty)$ to define a penalization on large values of the regression coefficients. We illustrate the approach on non-parametric regression models with B-splines, although the method works seamlessly for other choices of basis functions.
arXiv Detail & Related papers (2022-05-19T12:29:09Z)
Observable adjustments in single-index models for regularized M-estimators [3.5353632767823506]
In regime where sample size $n$ and dimension $p$ are both increasing, the behavior of the empirical distribution of $hatbeta$ and the predicted values $Xhatbeta$ has been previously characterized. This paper develops a different theory to describe the empirical distribution of $hatbeta$ and $Xhatbeta$.
arXiv Detail & Related papers (2022-04-14T14:32:02Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z)
The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models. We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.