Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation
- URL: http://arxiv.org/abs/2601.22367v1
- Date: Thu, 29 Jan 2026 22:20:47 GMT
- Title: Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation
- Authors: Shiyi Sun, Geoff K. Nicholls, Jeong Eun Lee,
- Abstract summary: Generalized Bayesian Inference (GBI) tempers a loss with a temperature $>0$ to mitigate overconfidence and improve under model misspecification.<n>We give the first fully amortized variational approximation to the tempered posterior family $p_(mid x) propto (),p(x mid )$ by training a single $(x,)$-conditioned neural posterior estimator $q_(mid x,)$ that enables sampling in a single forward pass.
- Score: 1.096028999747108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalized Bayesian Inference (GBI) tempers a loss with a temperature $β>0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset and each $β$ value. We give the first fully amortized variational approximation to the tempered posterior family $p_β(θ\mid x) \propto π(θ)\,p(x \mid θ)^β$ by training a single $(x,β)$-conditioned neural posterior estimator $q_φ(θ\mid x,β)$ that enables sampling in a single forward pass, without simulator calls or inference-time MCMC. We introduce two complementary training routes: (i) synthesize off-manifold samples $(θ,x) \sim π(θ)\,p(x \mid θ)^β$ and (ii) reweight a fixed base dataset $π(θ)\,p(x \mid θ)$ using self-normalized importance sampling (SNIS). We show that the SNIS-weighted objective provides a consistent forward-KL fit to the tempered posterior with finite weight variance. Across four standard simulation-based inference (SBI) benchmarks, including the chaotic Lorenz-96 system, our $β$-amortized estimator achieves competitive posterior approximations in standard two-sample metrics, matching non-amortized MCMC-based power-posterior samplers over a wide range of temperatures.
Related papers
- Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z) - Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum [62.691095807959215]
We establish an optimal sample complexity of $O(-2)$ for obtaining an $$-optimal global policy using a single-timescale actor-critic (AC) algorithm.<n>These mechanisms are compatible with existing deep learning architectures and require only minor modifications, without compromising practical applicability.
arXiv Detail & Related papers (2026-02-02T00:35:42Z) - Inferring Cosmological Parameters with Evidential Physics-Informed Neural Networks [0.0]
We use a novel variant of Physics-Informed Neural Networks to predict cosmological parameters from recent supernovae and baryon acoustic oscillations (BAO) datasets.<n>Our machine learning framework generates uncertainty estimates for target variables and the inferred unknown parameters of the underlying PDE descriptions.
arXiv Detail & Related papers (2025-09-29T06:25:53Z) - Precise Asymptotics of Bagging Regularized M-estimators [20.077783679095443]
We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators.<n>Key to our analysis is a new result on the joint behavior of correlations between the estimator and residual errors on overlapping subsamples.<n>Joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data.
arXiv Detail & Related papers (2024-09-23T17:48:28Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Posterior Coreset Construction with Kernelized Stein Discrepancy for
Model-Based Reinforcement Learning [78.30395044401321]
We develop a novel model-based approach to reinforcement learning (MBRL)
It relaxes the assumptions on the target transition model to belong to a generic family of mixture models.
It can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.
arXiv Detail & Related papers (2022-06-02T17:27:49Z) - Variational Inference for Bayesian Bridge Regression [0.0]
We study the implementation of Automatic Differentiation Variational inference (ADVI) for Bayesian inference on regression models with bridge penalization.
The bridge approach uses $ell_alpha$ norm, with $alpha in (0, +infty)$ to define a penalization on large values of the regression coefficients.
We illustrate the approach on non-parametric regression models with B-splines, although the method works seamlessly for other choices of basis functions.
arXiv Detail & Related papers (2022-05-19T12:29:09Z) - Observable adjustments in single-index models for regularized
M-estimators [3.5353632767823506]
In regime where sample size $n$ and dimension $p$ are both increasing, the behavior of the empirical distribution of $hatbeta$ and the predicted values $Xhatbeta$ has been previously characterized.
This paper develops a different theory to describe the empirical distribution of $hatbeta$ and $Xhatbeta$.
arXiv Detail & Related papers (2022-04-14T14:32:02Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.