On Signal-to-Noise Ratio Issues in Variational Inference for Deep
Gaussian Processes
- URL: http://arxiv.org/abs/2011.00515v2
- Date: Wed, 21 Jul 2021 12:14:08 GMT
- Title: On Signal-to-Noise Ratio Issues in Variational Inference for Deep
Gaussian Processes
- Authors: Tim G. J. Rudner, Oscar Key, Yarin Gal, Tom Rainforth
- Abstract summary: We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues.
We show that our fix can lead to consistent improvements in the predictive performance of DGP models.
- Score: 55.62520135103578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that the gradient estimates used in training Deep Gaussian Processes
(DGPs) with importance-weighted variational inference are susceptible to
signal-to-noise ratio (SNR) issues. Specifically, we show both theoretically
and via an extensive empirical evaluation that the SNR of the gradient
estimates for the latent variable's variational parameters decreases as the
number of importance samples increases. As a result, these gradient estimates
degrade to pure noise if the number of importance samples is too large. To
address this pathology, we show how doubly reparameterized gradient estimators,
originally proposed for training variational autoencoders, can be adapted to
the DGP setting and that the resultant estimators completely remedy the SNR
issue, thereby providing more reliable training. Finally, we demonstrate that
our fix can lead to consistent improvements in the predictive performance of
DGP models.
Related papers
- Understanding and mitigating difficulties in posterior predictive evaluation [20.894503281724052]
We observe that the signal-to-noise ratio (SNR) of such estimators can be extremely low.
We propose replacing simple MC sampling with importance sampling using a proposal distribution optimized at test time on a variational proxy for the SNR.
arXiv Detail & Related papers (2024-05-30T06:50:28Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Generalized Doubly Reparameterized Gradient Estimators [18.253352549048564]
We develop two generalizations of the DReGs estimator and show that they can be used to train conditional and hierarchical VAEs on image modelling tasks more effectively.
We first extend the estimator to hierarchical models with several layers by showing how to treat additional score function terms due to the hierarchical variational posterior.
We then generalize DReGs to score functions of arbitrary distributions instead of just those of the sampling distribution, which makes the estimator applicable to the parameters of the prior in addition to those of the posterior.
arXiv Detail & Related papers (2021-01-26T19:30:00Z) - Optimal Variance Control of the Score Function Gradient Estimator for
Importance Weighted Bounds [12.75471887147565]
This paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE)
We prove that in the limit of large $K$ one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as $sqrtK$.
arXiv Detail & Related papers (2020-08-05T08:41:46Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Deep Sigma Point Processes [22.5396672566053]
We introduce a class of parametric models inspired by the compositional structure of Deep Gaussian Processes (DGPs)
Deep Sigma Point Processes (DSPPs) retain many of the attractive features of (variational) DGPs, including mini-batch training and predictive uncertainty that is controlled by kernel basis functions.
arXiv Detail & Related papers (2020-02-21T03:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.