Related papers: Improved Langevin Monte Carlo for stochastic optimization via landscape modification

Improved Langevin Monte Carlo for stochastic optimization via landscape modification

URL: http://arxiv.org/abs/2302.03973v1
Date: Wed, 8 Feb 2023 10:08:37 GMT
Title: Improved Langevin Monte Carlo for stochastic optimization via landscape modification
Authors: Michael C. H. Choi, Youjia Wang
Abstract summary: Given a target function $H$ to minimize or a target Gibbs distribution $pi_beta0 propto e-beta H$ to sample from in the low temperature, in this paper we propose and analyze Langevin Monte Carlo (LMC) algorithms that run on an alternative landscape. We show that the energy barrier of the transformed landscape is reduced which consequently leads to dependence on both $beta$ and $M$ in the modified Log-Sobolev constant associated with $pif_beta,c,1$.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given a target function $H$ to minimize or a target Gibbs distribution $\pi_{\beta}^0 \propto e^{-\beta H}$ to sample from in the low temperature, in this paper we propose and analyze Langevin Monte Carlo (LMC) algorithms that run on an alternative landscape as specified by $H^f_{\beta,c,1}$ and target a modified Gibbs distribution $\pi^f_{\beta,c,1} \propto e^{-\beta H^f_{\beta,c,1}}$, where the landscape of $H^f_{\beta,c,1}$ is a transformed version of that of $H$ which depends on the parameters $f,\beta$ and $c$. While the original Log-Sobolev constant affiliated with $\pi^0_{\beta}$ exhibits exponential dependence on both $\beta$ and the energy barrier $M$ in the low temperature regime, with appropriate tuning of these parameters and subject to assumptions on $H$, we prove that the energy barrier of the transformed landscape is reduced which consequently leads to polynomial dependence on both $\beta$ and $M$ in the modified Log-Sobolev constant associated with $\pi^f_{\beta,c,1}$. This yield improved total variation mixing time bounds and improved convergence toward a global minimum of $H$. We stress that the technique developed in this paper is not only limited to LMC and is broadly applicable to other gradient-based optimization or sampling algorithms.

Related papers

Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction [57.93371273485736]
We consider a centralized distributed learning setup where all workers jointly find an unbiased bound LDeltaepsilon2,$ better poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution.
arXiv Detail & Related papers (2025-06-30T13:27:39Z)
A PDE-Based Image Dehazing Method via Atmospheric Scattering Theory [2.17172315573773]
This paper presents a novel partial differential equation (PDE) framework for single-image dehazing.<n>By integrating the atmospheric scattering model with nonlocal regularization and dark channel prior, we propose the improved PDE.
arXiv Detail & Related papers (2025-06-10T13:43:09Z)
Corner Gradient Descent [13.794391803767617]
We show that rates up to $O(t-2zeta)$ can be achieved by a generalized stationary SGD with infinite memory.<n>We show that ideal corner algorithms can be efficiently approximated by finite-memory algorithms.
arXiv Detail & Related papers (2025-04-16T22:39:41Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T. Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$. Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z)
Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization [51.533164528799084]
We show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of $lambda$. We find that this pattern holds true even when the function evaluations are noisy.
arXiv Detail & Related papers (2024-01-11T07:45:09Z)
Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space [10.292118864147097]
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbbRd$ by a product measure $pistar$. As a byproduct of our analysis, we obtain the first end-to-end analysis for gradient-based algorithms for MFVI.
arXiv Detail & Related papers (2023-12-05T16:02:04Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Local Max-Entropy and Free Energy Principles Solved by Belief Propagation [0.0]
A statistical system is classically defined on a set of microstates $E$ by a global energy function $H : E to mathbbR$, yielding Gibbs probability measures $rhobeta(H)$ for every inverse temperature $beta = T-1$. We show that the generalized belief propagation algorithm solves a collection of local variational principles, by converging to critical points of Bethe-Kikuchi approximations of the free energy $F(beta)$, the Shannon entropy $S(cal U)$, and the variational free energy
arXiv Detail & Related papers (2022-07-02T14:20:40Z)
Faster Sampling from Log-Concave Distributions over Polytopes via a Soft-Threshold Dikin Walk [28.431572772564518]
We consider the problem of sampling from a $d$-dimensional log-concave distribution $pi(theta) propto e-f(theta)$ constrained to a polytope $K$ defined by $m$ inequalities. Our main result is a "soft-warm'' variant of the Dikin walk Markov chain that requires at most $O((md + d L2 R2) times MDomega-1) log(fracwdelta)$ arithmetic operations to sample from $pi$
arXiv Detail & Related papers (2022-06-19T11:33:07Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity. We proposed a method to solve problems master to agents' communication and local calls. The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z)
Continuous Submodular Maximization: Boosting via Non-oblivious Function [12.755674710719616]
In this paper, we revisit the constrained and continuous submodular iterations in both offline and online settings. We use the factorrevealing optimization equation to derive an optimal auxiliary function $F$ for problem $max_boldsymbolxinmathCf(boldsymbolx)$. In the online setting, we propose boosting a gradient feedback algorithm, achieving a regret of $sqrtD$ (where $D$ is the sum of delays of gradient feedback against $(fracgamma2)$ adversarial.
arXiv Detail & Related papers (2022-01-03T15:10:17Z)
Minimax Optimal Regression over Sobolev Spaces via Laplacian Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression. We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
A New Framework for Variance-Reduced Hamiltonian Monte Carlo [88.84622104944503]
We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC) methods for sampling from an $L$-smooth and $m$-strongly log-concave distribution. We show that the unbiased gradient estimators, including SAGA and SVRG, based HMC methods achieve highest gradient efficiency with small batch size. Experimental results on both synthetic and real-world benchmark data show that our new framework significantly outperforms the full gradient and gradient HMC approaches.
arXiv Detail & Related papers (2021-02-09T02:44:24Z)
Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics. In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness [10.482805367361818]
We show that for potentials with Lipschitz gradient, i.e. $beta=1$, our rate rate recovers the best known rate rate of dependency. Our results are applicable to $nu_* = eff$ in target distribution $nu_*$ in KL-divergence.
arXiv Detail & Related papers (2020-05-27T00:26:20Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging. We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.