Variance reduction for Random Coordinate Descent-Langevin Monte Carlo
- URL: http://arxiv.org/abs/2006.06068v4
- Date: Thu, 22 Oct 2020 01:18:23 GMT
- Title: Variance reduction for Random Coordinate Descent-Langevin Monte Carlo
- Authors: Zhiyan Ding and Qin Li
- Abstract summary: Langevin Monte Carlo (LMC) that provides fast convergence requires computation of gradient approximations.
In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions.
We introduce a new variance reduction approach, termed Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC.
- Score: 7.464874233755718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling from a log-concave distribution function is one core problem that
has wide applications in Bayesian statistics and machine learning. While most
gradient free methods have slow convergence rate, the Langevin Monte Carlo
(LMC) that provides fast convergence requires the computation of gradients. In
practice one uses finite-differencing approximations as surrogates, and the
method is expensive in high-dimensions.
A natural strategy to reduce computational cost in each iteration is to
utilize random gradient approximations, such as random coordinate descent (RCD)
or simultaneous perturbation stochastic approximation (SPSA). We show by a
counter-example that blindly applying RCD does not achieve the goal in the most
general setting. The high variance induced by the randomness means a larger
number of iterations are needed, and this balances out the saving in each
iteration.
We then introduce a new variance reduction approach, termed Randomized
Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped
and underdamped LMC. The methods are termed RCAD-O-LMC and RCAD-U-LMC
respectively. The methods still sit in the random gradient approximation
framework, and thus the computational cost in each iteration is low. However,
by employing RCAD, the variance is reduced, so the methods converge within the
same number of iterations as the classical overdamped and underdamped LMC. This
leads to a computational saving overall.
Related papers
- Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex
Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties.
We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - Langevin Monte Carlo: random coordinate descent and variance reduction [7.464874233755718]
Langevin Monte Carlo (LMC) is a popular Bayesian sampling method.
We investigate how to enhance computational efficiency through the application of RCD (random coordinate descent) on LMC.
arXiv Detail & Related papers (2020-07-26T18:14:36Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.