Amortized variance reduction for doubly stochastic objectives
- URL: http://arxiv.org/abs/2003.04125v1
- Date: Mon, 9 Mar 2020 13:23:14 GMT
- Title: Amortized variance reduction for doubly stochastic objectives
- Authors: Ayman Boustati, Sattar Vakili, James Hensman, ST John
- Abstract summary: Approximate inference in complex probabilistic models requires optimisation of doubly objective functions.
Current approaches do not take into account how mini-batchity affects samplingity, resulting in sub-optimal variance reduction.
We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional gradient computations.
- Score: 17.064916635597417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approximate inference in complex probabilistic models such as deep Gaussian
processes requires the optimisation of doubly stochastic objective functions.
These objectives incorporate randomness both from mini-batch subsampling of the
data and from Monte Carlo estimation of expectations. If the gradient variance
is high, the stochastic optimisation problem becomes difficult with a slow rate
of convergence. Control variates can be used to reduce the variance, but past
approaches do not take into account how mini-batch stochasticity affects
sampling stochasticity, resulting in sub-optimal variance reduction. We propose
a new approach in which we use a recognition network to cheaply approximate the
optimal control variate for each mini-batch, with no additional model gradient
computations. We illustrate the properties of this proposal and test its
performance on logistic regression and deep Gaussian processes.
Related papers
- Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Joint Sampling and Optimisation for Inverse Rendering [24.290038684298164]
Averaging many gradient samples in each iteration reduces this variance trivially.
We derive a theoretical framework for interleaving sampling and optimisation.
We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimisation tasks.
arXiv Detail & Related papers (2023-09-27T14:21:13Z) - Neural Operator Variational Inference based on Regularized Stein
Discrepancy for Deep Gaussian Processes [23.87733307119697]
We introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes.
NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior.
We demonstrate that the bias introduced by our method can be controlled by multiplying the divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm.
arXiv Detail & Related papers (2023-09-22T06:56:35Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Variance Regularization for Accelerating Stochastic Optimization [14.545770519120898]
We propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients.
This is achieved by regularizing the learning-rate according to mini-batch variances.
arXiv Detail & Related papers (2020-08-13T15:34:01Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.