Related papers: Convergence Acceleration of Markov Chain Monte Carlo-based Gradient Descent by Deep Unfolding

Convergence Acceleration of Markov Chain Monte Carlo-based Gradient Descent by Deep Unfolding

URL: http://arxiv.org/abs/2402.13608v1
Date: Wed, 21 Feb 2024 08:21:48 GMT
Title: Convergence Acceleration of Markov Chain Monte Carlo-based Gradient Descent by Deep Unfolding
Authors: Ryo Hagiwara and Satoshi Takabe
Abstract summary: This study proposes a trainable sampling-based solver for optimization problems (COPs) using a deep-learning technique called deep unfolding. The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent. The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method.
Score: 5.584060970507506
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study proposes a trainable sampling-based solver for combinatorial optimization problems (COPs) using a deep-learning technique called deep unfolding. The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent, and its step sizes are trained by minimizing a loss function. In the training process, we propose a sampling-based gradient estimation that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC. The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method.

Related papers

Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles [3.2254941904559917]
We introduce a scalable variant by incorporating Gradient Hamiltonian Monte Carlo proposals into Sequential Monte Carlo (SMC) methods.<n>Our resulting SMCSGHMC algorithm outperforms gradient descent ensembles across image classification, out-of-distribution detection, and transfer learning tasks.
arXiv Detail & Related papers (2025-05-16T20:10:04Z)
Scaled Conjugate Gradient Method for Nonconvex Optimization in Deep Neural Networks [0.6144680854063939]
A scaled conjugate gradient method is proposed for solving non optimization problems with deep neural networks. The proposed method is shown to be superior to that of the conjugate gradient method in practical applications of image and text classification.
arXiv Detail & Related papers (2024-12-16T02:57:23Z)
Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models. We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z)
Sequential Monte Carlo for Inclusive KL Minimization in Amortized Variational Inference [3.126959812401426]
We propose SMC-Wake, a procedure for fitting an amortized variational approximation that uses sequential Monte Carlo samplers to estimate the gradient of the inclusive KL divergence. In experiments with both simulated and real datasets, SMC-Wake fits variational distributions that approximate the posterior more accurately than existing methods.
arXiv Detail & Related papers (2024-03-15T18:13:48Z)
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo [30.4930148381328]
Diffusion-based Monte Carlo (DMC) is a method to sample from a general target distribution beyond the isoperimetric condition. DMC encountered high gradient complexity, resulting in an exponential dependency on the error tolerance $epsilon$ of the obtained samples. We propose RS-DMC, based on a novel recursion-based score estimation method. Our algorithm is provably much faster than the popular Langevin-based algorithms.
arXiv Detail & Related papers (2024-01-12T02:33:57Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Sparse Backpropagation for MoE Training [118.31785160874024]
We introduce SparseMixer, a scalable gradient estimator that bridges the gap between backpropagation and sparse expert routing. Grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a second-order ODE solver, to deliver precise gradient approximations. Applying SparseMixer to Switch Transformer on both pre-training and machine translation tasks, SparseMixer showcases considerable performance gain.
arXiv Detail & Related papers (2023-10-01T22:43:57Z)
A Homogenization Approach for Gradient-Dominated Stochastic Optimization [6.1144486886258065]
We propose a homogeneous second-order descent method (SHSOD) for functions enjoying gradient dominance. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated optimization.
arXiv Detail & Related papers (2023-08-21T11:03:04Z)
Reverse Diffusion Monte Carlo [19.35592726471155]
We propose a novel Monte Carlo sampling algorithm called reverse diffusion Monte Carlo (rdMC) rdMC is distinct from the Markov chain Monte Carlo (MCMC) methods.
arXiv Detail & Related papers (2023-07-05T05:42:03Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Bayesian Pseudo-Coresets via Contrastive Divergence [5.479797073162603]
We introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence. It eliminates the need for approximations in the pseudo-coreset construction process. We conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques.
arXiv Detail & Related papers (2023-03-20T17:13:50Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions. We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.