A New Framework for Variance-Reduced Hamiltonian Monte Carlo
- URL: http://arxiv.org/abs/2102.04613v1
- Date: Tue, 9 Feb 2021 02:44:24 GMT
- Title: A New Framework for Variance-Reduced Hamiltonian Monte Carlo
- Authors: Zhengmian Hu, Feihu Huang, Heng Huang
- Abstract summary: We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC) methods for sampling from an $L$-smooth and $m$-strongly log-concave distribution.
We show that the unbiased gradient estimators, including SAGA and SVRG, based HMC methods achieve highest gradient efficiency with small batch size.
Experimental results on both synthetic and real-world benchmark data show that our new framework significantly outperforms the full gradient and gradient HMC approaches.
- Score: 88.84622104944503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC)
methods for sampling from an $L$-smooth and $m$-strongly log-concave
distribution, based on a unified formulation of biased and unbiased variance
reduction methods. We study the convergence properties for HMC with gradient
estimators which satisfy the Mean-Squared-Error-Bias (MSEB) property. We show
that the unbiased gradient estimators, including SAGA and SVRG, based HMC
methods achieve highest gradient efficiency with small batch size under high
precision regime, and require $\tilde{O}(N + \kappa^2 d^{\frac{1}{2}}
\varepsilon^{-1} + N^{\frac{2}{3}} \kappa^{\frac{4}{3}} d^{\frac{1}{3}}
\varepsilon^{-\frac{2}{3}} )$ gradient complexity to achieve
$\epsilon$-accuracy in 2-Wasserstein distance. Moreover, our HMC methods with
biased gradient estimators, such as SARAH and SARGE, require
$\tilde{O}(N+\sqrt{N} \kappa^2 d^{\frac{1}{2}} \varepsilon^{-1})$ gradient
complexity, which has the same dependency on condition number $\kappa$ and
dimension $d$ as full gradient method, but improves the dependency of sample
size $N$ for a factor of $N^\frac{1}{2}$. Experimental results on both
synthetic and real-world benchmark data show that our new framework
significantly outperforms the full gradient and stochastic gradient HMC
approaches. The earliest version of this paper was submitted to ICML 2020 with
three weak accept but was not finally accepted.
Related papers
- Provable Policy Gradient Methods for Average-Reward Markov Potential
Games [15.136494705127564]
We study Markov potential games under the infinite horizon average reward criterion.
We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion.
arXiv Detail & Related papers (2024-03-09T00:20:33Z) - Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients [0.8749675983608172]
We present an unbiased method for posterior means based on kinetic Langevin dynamics.
Our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem.
Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the gold-standard" randomized Hamiltonian Monte Carlo.
arXiv Detail & Related papers (2023-11-08T21:19:52Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Tightening the Dependence on Horizon in the Sample Complexity of
Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$.
Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Fast decentralized non-convex finite-sum optimization with recursive
variance reduction [19.540926205375857]
We describe a first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique.
In particular, in a big-data regime such that $n = O(Nfrac12(lambda)3)$, this complexitys reduces to $O(Nfrac12Lepsilon-2)$, independent of the network complexity.
In addition, we show appropriate choices of local minibatch size balance the trade-offs between gradient complexity and communication complexity.
arXiv Detail & Related papers (2020-08-17T15:51:32Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.