Replica Exchange for Non-Convex Optimization
- URL: http://arxiv.org/abs/2001.08356v4
- Date: Wed, 16 Jun 2021 06:23:31 GMT
- Title: Replica Exchange for Non-Convex Optimization
- Authors: Jing Dong and Xin T. Tong
- Abstract summary: Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima.
Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and verify weak force.
This paper shows that these two algorithms and their non-swapping variants can collaborate" through a simple exchange mechanism.
- Score: 4.421561004829125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient descent (GD) is known to converge quickly for convex objective
functions, but it can be trapped at local minima. On the other hand, Langevin
dynamics (LD) can explore the state space and find global minima, but in order
to give accurate estimates, LD needs to run with a small discretization step
size and weak stochastic force, which in general slow down its convergence.
This paper shows that these two algorithms and their non-swapping variants. can
``collaborate" through a simple exchange mechanism, in which they swap their
current positions if LD yields a lower objective function. This idea can be
seen as the singular limit of the replica-exchange technique from the sampling
literature. We show that this new algorithm converges to the global minimum
linearly with high probability, assuming the objective function is strongly
convex in a neighborhood of the unique global minimum. By replacing gradients
with stochastic gradients, and adding a proper threshold to the exchange
mechanism, our algorithm can also be used in online settings. We also study
non-swapping variants of the algorithm, which achieve similar performance. We
further verify our theoretical results through some numerical experiments and
observe superior performance of the proposed algorithm over running GD or LD
alone.
Related papers
- Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials [15.718093624695552]
We analyze the convergence of Gradient Langevin Dynamics (SGLD) to global minima based on Lyapunov potentials and optimization.
We provide 1) improved in the setting of previous works SGLD for optimization, 2) first finite gradient complexity for SGLD, and 3) prove if continuous-time Langevin Dynamics succeeds for optimization, then discrete-time SGLD succeeds under mild regularity assumptions.
arXiv Detail & Related papers (2024-07-05T05:34:10Z) - Using Differential Evolution to avoid local minima in Variational
Quantum Algorithms [0.0]
Variational Quantum Algorithms (VQAs) are among the most promising NISQ-era algorithms for harnessing quantum computing.
Our goal in this paper is to study alternative optimization methods that can avoid or reduce the effect of local minima and barren plateau problems.
arXiv Detail & Related papers (2023-03-21T20:31:06Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs)
The overall objective is a sum of agents' private local objective functions.
We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - The Minimax Complexity of Distributed Optimization [0.0]
I present the "graph oracle model", an extension of the classic oracle framework that can be applied to distributed optimization.
I focus on the specific case of the "intermittent communication setting"
I analyze the theoretical properties of the popular Local Descent (SGD) algorithm in convex setting.
arXiv Detail & Related papers (2021-09-01T15:18:33Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Aligning Partially Overlapping Point Sets: an Inner Approximation
Algorithm [80.15123031136564]
We propose a robust method to align point sets where there is no prior information about the value of the transformation.
Our algorithm does not need regularization on transformation, and thus can handle the situation where there is no prior information about the values of the transformations.
Experimental results demonstrate the better robustness of the proposed method over state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-05T15:23:33Z) - Unified Analysis of Stochastic Gradient Methods for Composite Convex and
Smooth Optimization [15.82816385434718]
We present a unified theorem for the convergence analysis of gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer.
We do this by extending the unified analysis of Gorbunov, Hanzely & Richt'arik ( 2020) and dropping the requirement that the loss function be strongly convex.
Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods.
arXiv Detail & Related papers (2020-06-20T13:40:27Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.