Joint control variate for faster black-box variational inference
- URL: http://arxiv.org/abs/2210.07290v4
- Date: Fri, 8 Mar 2024 05:04:23 GMT
- Title: Joint control variate for faster black-box variational inference
- Authors: Xi Wang, Tomas Geffner, Justin Domke
- Abstract summary: Black-box variational inference performance is sometimes hindered by the use of gradient estimators with high variance.
This variance comes from two sources of randomness: Data subsampling and Monte Carlo sampling.
- Score: 32.38477249925875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Black-box variational inference performance is sometimes hindered by the use
of gradient estimators with high variance. This variance comes from two sources
of randomness: Data subsampling and Monte Carlo sampling. While existing
control variates only address Monte Carlo noise, and incremental gradient
methods typically only address data subsampling, we propose a new "joint"
control variate that jointly reduces variance from both sources of noise. This
significantly reduces gradient variance, leading to faster optimization in
several applications.
Related papers
- An Elementary Approach to Scheduling in Generative Diffusion Models [55.171367482496755]
An elementary approach to characterizing the impact of noise scheduling and time discretization in generative diffusion models is developed.<n> Experiments across different datasets and pretrained models demonstrate that the time discretization strategy selected by our approach consistently outperforms baseline and search-based strategies.
arXiv Detail & Related papers (2026-01-20T05:06:26Z) - Sequential Monte Carlo for Inclusive KL Minimization in Amortized Variational Inference [3.126959812401426]
We propose SMC-Wake, a procedure for fitting an amortized variational approximation that uses sequential Monte Carlo samplers to estimate the gradient of the inclusive KL divergence.
In experiments with both simulated and real datasets, SMC-Wake fits variational distributions that approximate the posterior more accurately than existing methods.
arXiv Detail & Related papers (2024-03-15T18:13:48Z) - When can Regression-Adjusted Control Variates Help? Rare Events, Sobolev
Embedding and Minimax Optimality [10.21792151799121]
We show that a machine learning-based estimator can be used to mitigate the variance of Monte Carlo sampling.
In the presence of rare and extreme events, a truncated version of the Monte Carlo algorithm can achieve the minimax optimal rate.
arXiv Detail & Related papers (2023-05-25T23:09:55Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Approximation Based Variance Reduction for Reparameterization Gradients [38.73307745906571]
Flexible variational distributions improve variational inference but are harder to optimize.
We present a control variate that is applicable for any reizable distribution with known mean and covariance matrix.
It leads to large improvements in gradient variance and optimization convergence for inference with non-factorized variational distributions.
arXiv Detail & Related papers (2020-07-29T06:55:11Z) - A Study of Gradient Variance in Deep Learning [56.437755740715396]
We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.
We measure the gradient variance on common deep learning benchmarks and observe that, contrary to common assumptions, gradient variance increases during training.
arXiv Detail & Related papers (2020-07-09T03:23:10Z) - Scalable Control Variates for Monte Carlo Methods via Stochastic
Optimization [62.47170258504037]
This paper presents a framework that encompasses and generalizes existing approaches that use controls, kernels and neural networks.
Novel theoretical results are presented to provide insight into the variance reduction that can be achieved, and an empirical assessment, including applications to Bayesian inference, is provided in support.
arXiv Detail & Related papers (2020-06-12T22:03:25Z) - Amortized variance reduction for doubly stochastic objectives [17.064916635597417]
Approximate inference in complex probabilistic models requires optimisation of doubly objective functions.
Current approaches do not take into account how mini-batchity affects samplingity, resulting in sub-optimal variance reduction.
We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional gradient computations.
arXiv Detail & Related papers (2020-03-09T13:23:14Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.