Unnormalized Variational Bayes
- URL: http://arxiv.org/abs/2007.15130v1
- Date: Wed, 29 Jul 2020 21:58:54 GMT
- Title: Unnormalized Variational Bayes
- Authors: Saeed Saremi
- Abstract summary: We unify empirical Bayes and variational Bayes for approximating unnormalized densities.
This framework, named unnormalized variational Bayes (UVB), is based on formulating a latent variable model for the random variable $Y=X+N(0,sigma2 I_d)$.
- Score: 1.599072005190786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We unify empirical Bayes and variational Bayes for approximating unnormalized
densities. This framework, named unnormalized variational Bayes (UVB), is based
on formulating a latent variable model for the random variable
$Y=X+N(0,\sigma^2 I_d)$ and using the evidence lower bound (ELBO), computed by
a variational autoencoder, as a parametrization of the energy function of $Y$
which is then used to estimate $X$ with the empirical Bayes least-squares
estimator. In this intriguing setup, the $\textit{gradient}$ of the ELBO with
respect to noisy inputs plays the central role in learning the energy function.
Empirically, we demonstrate that UVB has a higher capacity to approximate
energy functions than the parametrization with MLPs as done in neural empirical
Bayes (DEEN). We especially showcase $\sigma=1$, where the differences between
UVB and DEEN become visible and qualitative in the denoising experiments. For
this high level of noise, the distribution of $Y$ is very smoothed and we
demonstrate that one can traverse in a single run $-$ without a restart $-$ all
MNIST classes in a variety of styles via walk-jump sampling with a fast-mixing
Langevin MCMC sampler. We finish by probing the encoder/decoder of the trained
models and confirm UVB $\neq$ VAE.
Related papers
- Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators [4.887201041798969]
We introduce Latent-IMH, a sampling method based on the Metropolis-Hastings independence (IMH) sampler.<n>Latent-IMH first generates intermediate latent variables using the approximate $tildeA$, and then refines them using the exact $A$.<n>We theoretically analyze the performance of Latent-IMH using KL divergence and mixing time bounds.
arXiv Detail & Related papers (2026-01-28T03:44:01Z) - CarBoN: Calibrated Best-of-N Sampling Improves Test-time Reasoning [62.56541355300587]
We introduce a general test-time calibration framework that adaptively modifies the model toward high-reward reasoning paths.<n>Within this framework, we propose CarBoN, a two-phase method that first explores the solution space and then learns a calibration of the logits.<n>Experiments on MATH-500 and AIME-2024 show that CarBoN improves efficiency, with up to $4times$ fewer rollouts to reach the same accuracy.
arXiv Detail & Related papers (2025-10-17T14:04:37Z) - Solving Empirical Bayes via Transformers [18.654470796004265]
This work applies modern AI tools (transformers) to solving one of the oldest statistical problems: Poisson means under empirical Bayes (Poisson-EB) setting.
A transformer model is pre-trained on a set of synthetically generated pairs $(X,theta)$ and learns to do in-context learning (ICL) by adapting to unknown $pi$.
arXiv Detail & Related papers (2025-02-14T01:06:15Z) - A phase transition in sampling from Restricted Boltzmann Machines [2.6624014064407717]
We prove a phase transition phenomenon in the mixing time of the Gibbs sampler for a Restricted Boltzmann Machine.
A key insight is the link between the Gibbs sampler and a dynamical system.
arXiv Detail & Related papers (2024-10-10T23:51:22Z) - Beta-Sigma VAE: Separating beta and decoder variance in Gaussian variational autoencoder [3.842994409438228]
Variational autoencoder (VAE) is an established generative model but is notorious for its blurriness.
In this work, we investigate the blurry output problem of VAE and resolve it, exploiting the variance of Gaussian decoder and $beta$ of beta-VAE.
To address the problem, we propose Beta-Sigma VAE (BS-VAE) that explicitly separates $beta$ and decoder variance $sigma2_x$ in the model.
arXiv Detail & Related papers (2024-09-14T08:28:19Z) - Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers [28.678613691787096]
Previous approximations rely on the posterior means, which may not lie in the support of the image distribution.
We introduce a novel approach for posterior approximation that guarantees to generate valid samples within the support of the image distribution.
arXiv Detail & Related papers (2024-02-09T02:23:47Z) - Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature
and Bayesian Optimization [51.533164528799084]
We show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of $lambda$.
We find that this pattern holds true even when the function evaluations are noisy.
arXiv Detail & Related papers (2024-01-11T07:45:09Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions.
Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z) - Neural Inference of Gaussian Processes for Time Series Data of Quasars [72.79083473275742]
We introduce a new model that enables it to describe quasar spectra completely.
We also introduce a new method of inference of Gaussian process parameters, which we call $textitNeural Inference$.
The combination of both the CDRW model and Neural Inference significantly outperforms the baseline DRW and MLE.
arXiv Detail & Related papers (2022-11-17T13:01:26Z) - The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero.
We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$.
We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z) - Revealing Unobservables by Deep Learning: Generative Element Extraction
Networks (GEEN) [5.3028918247347585]
This paper proposes a novel method for estimating realizations of a latent variable $X*$ in a random sample.
To the best of our knowledge, this paper is the first to provide such identification in observation.
arXiv Detail & Related papers (2022-10-04T01:09:05Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Learning and Inference in Imaginary Noise Models [1.599072005190786]
A notion of smoothed variational inference emerges where the smoothing is implicitly enforced by the noise model of the decoder.
This is the concept of imaginary noise model, where the noise model dictates the functional form of the variational lower bound $mathcalL(sigma)$, but the noisy data are never seen during learning.
We report an intriguing power law $mathcalD_rm KL sim sigma-nu$ for the learned models and we study the inference in the $sigma$-VAE for unseen noisy
arXiv Detail & Related papers (2020-05-18T19:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.