Related papers: Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods

Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods

URL: http://arxiv.org/abs/2001.05567v1
Date: Wed, 15 Jan 2020 21:40:50 GMT
Title: Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods
Authors: Nimar S. Arora, Nazanin Khosravani Tehrani, Kinjal Divesh Shah, Michael Tingley, Yucen Lily Li, Narjes Torabi, David Noursi, Sepehr Akhavan Masouleh, Eric Lippert, Erik Meijer
Abstract summary: Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a single coordinate in the state space is modified in each step. Newtonian Monte Carlo (NMC) is a method to improve MCMC convergence by analyzing the first and second order gradients of the target density. NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension.
Score: 1.6042895233470602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a single coordinate in the state space is modified in each step. Structured relational models are a good candidate for this style of inference. In the single-site context, second order methods become feasible because the typical cubic costs associated with these methods is now restricted to the dimension of each coordinate. Our work, which we call Newtonian Monte Carlo (NMC), is a method to improve MCMC convergence by analyzing the first and second order gradients of the target density to determine a suitable proposal density at each point. Existing first order gradient-based methods suffer from the problem of determining an appropriate step size. Too small a step size and it will take a large number of steps to converge, while a very large step size will cause it to overshoot the high density region. NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension. However, our objective is to find a parameterized proposal density rather than the maxima. As a further improvement on existing first and second order methods, we show that random variables with constrained supports don't need to be transformed before taking a gradient step. We demonstrate the efficiency of NMC on a number of different domains. For statistical models where the prior is conjugate to the likelihood, our method recovers the posterior quite trivially in one step. However, we also show results on fairly large non-conjugate models, where NMC performs better than adaptive first order methods such as NUTS or other inexact scalable inference methods such as Stochastic Variational Inference or bootstrapping.

Related papers

Relaxed Total Generalized Variation Regularized Piecewise Smooth Mumford-Shah Model for Triangulated Surface Segmentation [0.7837881800517112]
We propose a novel piecewise smooth MS mesh segmentation model by utilizing the relaxed total generalized variation regularization (rTGV)<n>The new model assumes that the feature function of a mesh can be approximated by the sum of piecewise constant function and asmooth function.<n>The newly introduced method is effective in segmenting meshes with irregular structures and getting the better boundaries rather than the shortest boundaries.
arXiv Detail & Related papers (2025-07-25T14:00:32Z)
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness [51.302674884611335]
This work introduces a hybrid non-Euclidean optimization method which generalizes norm clipping by combining steepest descent and conditional gradient approaches.<n>We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
arXiv Detail & Related papers (2025-06-02T17:34:29Z)
AutoStep: Locally adaptive involutive MCMC [51.186543293659376]
AutoStep MCMC selects an appropriate step size at each iteration adapted to the local geometry of the target distribution. We show that AutoStep MCMC is competitive with state-of-the-art methods in terms of effective sample size per unit cost.
arXiv Detail & Related papers (2024-10-24T17:17:11Z)
Gaussian process regression and conditional Karhunen-Lo\'{e}ve models for data assimilation in inverse problems [68.8204255655161]
We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models. The CKLEMAP method provides better scalability compared to the standard MAP method.
arXiv Detail & Related papers (2023-01-26T18:14:12Z)
Neural network quantum state with proximal optimization: a ground-state searching scheme based on variational Monte Carlo [4.772126473623257]
We introduce a novel objective function with proximal optimization (PO) that enables multiple updates via reusing the mismatched samples. We investigate the performance of our VMC-PO algorithm for ground-state searching with a 1-dimensional transverse-field Ising model and 2-dimensional Heisenberg antiferromagnet on a square lattice.
arXiv Detail & Related papers (2022-10-29T04:55:39Z)
Super-model ecosystem: A domain-adaptation perspective [101.76769818069072]
This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry.
arXiv Detail & Related papers (2022-08-30T09:09:43Z)
DRSOM: A Dimension Reduced Second-Order Method [13.778619250890406]
Under a trust-like framework, our method preserves the convergence of the second-order method while using only information in a few directions. Theoretically, we show that the method has a local convergence and a global convergence rate of $O(epsilon-3/2)$ to satisfy the first-order and second-order conditions.
arXiv Detail & Related papers (2022-07-30T13:05:01Z)
The split Gibbs sampler revisited: improvements to its algorithmic structure and augmented target distribution [1.1279808969568252]
Current state-of-the-art methods often address these difficulties by replacing the posterior density with a smooth approximation. An alternative approach is based on data augmentation and relaxation, where auxiliary variables are introduced in order to construct an approximate augmented posterior distribution. This paper proposes a new accelerated proximal MCMC method called latent space SK-ROCK, which tightly combines the benefits of the two strategies.
arXiv Detail & Related papers (2022-06-28T11:21:41Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior. We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search) [32.24244211281863]
We study a simplistic setting -- smooth, convex losses with models over- parameterized enough to interpolate the data. We prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. We show that these techniques improve the convergence and generalization of adaptive gradient methods across tasks.
arXiv Detail & Related papers (2020-06-11T21:23:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.