Optimal Variance Control of the Score Function Gradient Estimator for
Importance Weighted Bounds
- URL: http://arxiv.org/abs/2008.01998v2
- Date: Tue, 8 Dec 2020 21:09:28 GMT
- Title: Optimal Variance Control of the Score Function Gradient Estimator for
Importance Weighted Bounds
- Authors: Valentin Li\'evin, Andrea Dittadi, Anders Christensen, Ole Winther
- Abstract summary: This paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE)
We prove that in the limit of large $K$ one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as $sqrtK$.
- Score: 12.75471887147565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces novel results for the score function gradient estimator
of the importance weighted variational bound (IWAE). We prove that in the limit
of large $K$ (number of importance samples) one can choose the control variate
such that the Signal-to-Noise ratio (SNR) of the estimator grows as $\sqrt{K}$.
This is in contrast to the standard pathwise gradient estimator where the SNR
decreases as $1/\sqrt{K}$. Based on our theoretical findings we develop a novel
control variate that extends on VIMCO. Empirically, for the training of both
continuous and discrete generative models, the proposed method yields superior
variance reduction, resulting in an SNR for IWAE that increases with $K$
without relying on the reparameterization trick. The novel estimator is
competitive with state-of-the-art reparameterization-free gradient estimators
such as Reweighted Wake-Sleep (RWS) and the thermodynamic variational objective
(TVO) when training generative models.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Rényi Neural Processes [14.11793373584558]
We propose R'enyi Neural Processes (RNP) to ameliorate the impacts of prior misspecification.
We scale the density ratio $fracpq$ by the power of (1-$alpha$) in the divergence gradients with respect to the posterior.
Our experiments show consistent log-likelihood improvements over state-of-the-art NP family models.
arXiv Detail & Related papers (2024-05-25T00:14:55Z) - Risk-averse Learning with Non-Stationary Distributions [18.15046585146849]
In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time.
We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure.
We show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions.
arXiv Detail & Related papers (2024-04-03T18:16:47Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - U-Statistics for Importance-Weighted Variational Inference [29.750633016889655]
We propose the use of U-statistics to reduce variance for estimation in importance-weighted variational inference.
We find empirically that U-statistic variance reduction can lead to modest to significant improvements in inference performance on a range of models.
arXiv Detail & Related papers (2023-02-27T16:08:43Z) - Estimation of Non-Crossing Quantile Regression Process with Deep ReQU
Neural Networks [5.5272015676880795]
We propose a penalized nonparametric approach to estimating the quantile regression process (QRP) in a nonseparable model using quadratic unit (ReQU) activated deep neural networks.
We establish the non-asymptotic excess risk bounds for the estimated QRP and derive the mean integrated squared error for the estimated QRP under mild smoothness and regularity conditions.
arXiv Detail & Related papers (2022-07-21T12:26:45Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - On Signal-to-Noise Ratio Issues in Variational Inference for Deep
Gaussian Processes [55.62520135103578]
We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues.
We show that our fix can lead to consistent improvements in the predictive performance of DGP models.
arXiv Detail & Related papers (2020-11-01T14:38:02Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.