Newtonian Monte Carlo: single-site MCMC meets second-order gradient
methods
- URL: http://arxiv.org/abs/2001.05567v1
- Date: Wed, 15 Jan 2020 21:40:50 GMT
- Title: Newtonian Monte Carlo: single-site MCMC meets second-order gradient
methods
- Authors: Nimar S. Arora, Nazanin Khosravani Tehrani, Kinjal Divesh Shah,
Michael Tingley, Yucen Lily Li, Narjes Torabi, David Noursi, Sepehr Akhavan
Masouleh, Eric Lippert, Erik Meijer
- Abstract summary: Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a single coordinate in the state space is modified in each step.
Newtonian Monte Carlo (NMC) is a method to improve MCMC convergence by analyzing the first and second order gradients of the target density.
NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension.
- Score: 1.6042895233470602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a
single coordinate in the state space is modified in each step. Structured
relational models are a good candidate for this style of inference. In the
single-site context, second order methods become feasible because the typical
cubic costs associated with these methods is now restricted to the dimension of
each coordinate. Our work, which we call Newtonian Monte Carlo (NMC), is a
method to improve MCMC convergence by analyzing the first and second order
gradients of the target density to determine a suitable proposal density at
each point. Existing first order gradient-based methods suffer from the problem
of determining an appropriate step size. Too small a step size and it will take
a large number of steps to converge, while a very large step size will cause it
to overshoot the high density region. NMC is similar to the Newton-Raphson
update in optimization where the second order gradient is used to automatically
scale the step size in each dimension. However, our objective is to find a
parameterized proposal density rather than the maxima.
As a further improvement on existing first and second order methods, we show
that random variables with constrained supports don't need to be transformed
before taking a gradient step. We demonstrate the efficiency of NMC on a number
of different domains. For statistical models where the prior is conjugate to
the likelihood, our method recovers the posterior quite trivially in one step.
However, we also show results on fairly large non-conjugate models, where NMC
performs better than adaptive first order methods such as NUTS or other inexact
scalable inference methods such as Stochastic Variational Inference or
bootstrapping.
Related papers
- AutoStep: Locally adaptive involutive MCMC [51.186543293659376]
AutoStep MCMC selects an appropriate step size at each iteration adapted to the local geometry of the target distribution.
We show that AutoStep MCMC is competitive with state-of-the-art methods in terms of effective sample size per unit cost.
arXiv Detail & Related papers (2024-10-24T17:17:11Z) - Gaussian process regression and conditional Karhunen-Lo\'{e}ve models
for data assimilation in inverse problems [68.8204255655161]
We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models.
The CKLEMAP method provides better scalability compared to the standard MAP method.
arXiv Detail & Related papers (2023-01-26T18:14:12Z) - Neural network quantum state with proximal optimization: a ground-state
searching scheme based on variational Monte Carlo [4.772126473623257]
We introduce a novel objective function with proximal optimization (PO) that enables multiple updates via reusing the mismatched samples.
We investigate the performance of our VMC-PO algorithm for ground-state searching with a 1-dimensional transverse-field Ising model and 2-dimensional Heisenberg antiferromagnet on a square lattice.
arXiv Detail & Related papers (2022-10-29T04:55:39Z) - Super-model ecosystem: A domain-adaptation perspective [101.76769818069072]
This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation.
Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry.
arXiv Detail & Related papers (2022-08-30T09:09:43Z) - DRSOM: A Dimension Reduced Second-Order Method [13.778619250890406]
Under a trust-like framework, our method preserves the convergence of the second-order method while using only information in a few directions.
Theoretically, we show that the method has a local convergence and a global convergence rate of $O(epsilon-3/2)$ to satisfy the first-order and second-order conditions.
arXiv Detail & Related papers (2022-07-30T13:05:01Z) - The split Gibbs sampler revisited: improvements to its algorithmic
structure and augmented target distribution [1.1279808969568252]
Current state-of-the-art methods often address these difficulties by replacing the posterior density with a smooth approximation.
An alternative approach is based on data augmentation and relaxation, where auxiliary variables are introduced in order to construct an approximate augmented posterior distribution.
This paper proposes a new accelerated proximal MCMC method called latent space SK-ROCK, which tightly combines the benefits of the two strategies.
arXiv Detail & Related papers (2022-06-28T11:21:41Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Adaptive Gradient Methods Converge Faster with Over-Parameterization
(but you should do a line-search) [32.24244211281863]
We study a simplistic setting -- smooth, convex losses with models over- parameterized enough to interpolate the data.
We prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate.
We show that these techniques improve the convergence and generalization of adaptive gradient methods across tasks.
arXiv Detail & Related papers (2020-06-11T21:23:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.