Stochastic Variational Inference with Tuneable Stochastic Annealing
- URL: http://arxiv.org/abs/2504.03902v1
- Date: Fri, 04 Apr 2025 19:46:10 GMT
- Title: Stochastic Variational Inference with Tuneable Stochastic Annealing
- Authors: John Paisley, Ghazal Fazelnia, Brian Barr,
- Abstract summary: We present a modified variational inference (SVI) approach applicable to both large and small datasets.<n>We are motivated by the fact that, in SVI, the larger the batch size the more approximately Gaussian is the intrinsic noise, but the smaller its variance.<n>We propose a simple method for achieving both goals of having larger variance noise to escape bad local optimal solutions and more data information to obtain more accurate gradient directions.
- Score: 1.6385815610837167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we exploit the observation that stochastic variational inference (SVI) is a form of annealing and present a modified SVI approach -- applicable to both large and small datasets -- that allows the amount of annealing done by SVI to be tuned. We are motivated by the fact that, in SVI, the larger the batch size the more approximately Gaussian is the intrinsic noise, but the smaller its variance. This low variance reduces the amount of annealing which is needed to escape bad local optimal solutions. We propose a simple method for achieving both goals of having larger variance noise to escape bad local optimal solutions and more data information to obtain more accurate gradient directions. The idea is to set an actual batch size, which may be the size of the data set, and a smaller effective batch size that matches the larger level of variance at this smaller batch size. The result is an approximation to the maximum entropy stochastic gradient at this variance level. We theoretically motivate our approach for the framework of conjugate exponential family models and illustrate the method empirically on the probabilistic matrix factorization collaborative filter, the Latent Dirichlet Allocation topic model, and the Gaussian mixture model.
Related papers
- Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues.
We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution.
Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z) - A Bayesian Approach Toward Robust Multidimensional Ellipsoid-Specific Fitting [0.0]
This work presents a novel and effective method for fitting multidimensional ellipsoids to scattered data in the contamination of noise and outliers.
We incorporate a uniform prior distribution to constrain the search for primitive parameters within an ellipsoidal domain.
We apply it to a wide range of practical applications such as microscopy cell counting, 3D reconstruction, geometric shape approximation, and magnetometer calibration tasks.
arXiv Detail & Related papers (2024-07-27T14:31:51Z) - Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities [0.5592394503914488]
We introduce a method for identifying and approximating a target measure $pi$ as a perturbation of a given reference measure $mu$.
Our main contribution unveils a connection between the emphdimensional logarithmic Sobolev inequality (LSI) and approximations with this ansatz.
arXiv Detail & Related papers (2024-06-18T20:02:44Z) - Implicit Manifold Gaussian Process Regression [49.0787777751317]
Gaussian process regression is widely used to provide well-calibrated uncertainty estimates.
It struggles with high-dimensional data because of the implicit low-dimensional manifold upon which the data actually lies.
In this paper we propose a technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way.
arXiv Detail & Related papers (2023-10-30T09:52:48Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Preferential Subsampling for Stochastic Gradient Langevin Dynamics [3.158346511479111]
gradient MCMC offers an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data.
The resulting gradient estimator may exhibit a high variance and impact sampler performance.
We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
arXiv Detail & Related papers (2022-10-28T14:56:18Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - Amortized variance reduction for doubly stochastic objectives [17.064916635597417]
Approximate inference in complex probabilistic models requires optimisation of doubly objective functions.
Current approaches do not take into account how mini-batchity affects samplingity, resulting in sub-optimal variance reduction.
We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional gradient computations.
arXiv Detail & Related papers (2020-03-09T13:23:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.