Learning Complexity of Simulated Annealing
- URL: http://arxiv.org/abs/2003.02981v2
- Date: Mon, 29 Jun 2020 21:17:37 GMT
- Title: Learning Complexity of Simulated Annealing
- Authors: Avrim Blum, Chen Dan, Saeed Seddighin
- Abstract summary: We study the criteria under which the temperature changes namely, the cooling schedule.
We provide positive results both in terms of sample complexity and simulation complexity.
We present time algorithms with provable guarantees for the learning problem.
- Score: 23.678288517578764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simulated annealing is an effective and general means of optimization. It is
in fact inspired by metallurgy, where the temperature of a material determines
its behavior in thermodynamics. Likewise, in simulated annealing, the actions
that the algorithm takes depend entirely on the value of a variable which
captures the notion of temperature. Typically, simulated annealing starts with
a high temperature, which makes the algorithm pretty unpredictable, and
gradually cools the temperature down to become more stable.
A key component that plays a crucial role in the performance of simulated
annealing is the criteria under which the temperature changes namely, the
cooling schedule. Motivated by this, we study the following question in this
work: "Given enough samples to the instances of a specific class of
optimization problems, can we design optimal (or approximately optimal) cooling
schedules that minimize the runtime or maximize the success rate of the
algorithm on average when the underlying problem is drawn uniformly at random
from the same class?"
We provide positive results both in terms of sample complexity and simulation
complexity. For sample complexity, we show that $\tilde O(\sqrt{m})$ samples
suffice to find an approximately optimal cooling schedule of length $m$. We
complement this result by giving a lower bound of $\tilde \Omega(m^{1/3})$ on
the sample complexity of any learning algorithm that provides an almost optimal
cooling schedule. These results are general and rely on no assumption. For
simulation complexity, however, we make additional assumptions to measure the
success rate of an algorithm. To this end, we introduce the monotone stationary
graph that models the performance of simulated annealing. Based on this model,
we present polynomial time algorithms with provable guarantees for the learning
problem.
Related papers
- Quantum Simulation-Based Optimization of a Cooling System [0.0]
Quantum algorithms promise up to exponential speedups for specific tasks relevant to numerical simulations.
However, these advantages quickly vanish when considering data input and output on quantum computers.
The recently introduced Quantum Simulation-Based Optimization (QuSO) treats simulations as subproblems within a larger optimization.
arXiv Detail & Related papers (2025-04-21T21:58:21Z) - Parallel simulation for sampling under isoperimetry and score-based diffusion models [56.39904484784127]
As data size grows, reducing the iteration cost becomes an important goal.
Inspired by the success of the parallel simulation of the initial value problem in scientific computation, we propose parallel Picard methods for sampling tasks.
Our work highlights the potential advantages of simulation methods in scientific computation for dynamics-based sampling and diffusion models.
arXiv Detail & Related papers (2024-12-10T11:50:46Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Not All Semantics are Created Equal: Contrastive Self-supervised
Learning with Automatic Temperature Individualization [51.41175648612714]
We propose a new robust contrastive loss inspired by distributionally robust optimization (DRO)
We show that our algorithm automatically learns a suitable $tau$ for each sample.
Our method outperforms prior strong baselines on unimodal and bimodal datasets.
arXiv Detail & Related papers (2023-05-19T19:25:56Z) - Uniform Generalization Bound on Time and Inverse Temperature for
Gradient Descent Algorithm and its Application to Analysis of Simulated
Annealing [0.0]
We propose a novel uniform generalization bound on the time and inverse temperature for gradient Langevin dynamics.
As an application generalization, an evaluation on the effectiveness is presented.
arXiv Detail & Related papers (2022-05-25T01:21:27Z) - An Algebraically Converging Stochastic Gradient Descent Algorithm for
Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function.
We prove the convergence of the algorithm with an algebra and tuning in the parameter space.
We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Adaptive Sequential SAA for Solving Two-stage Stochastic Linear Programs [1.6181085766811525]
We present adaptive sequential SAA (sample average approximation) algorithms to solve large-scale two-stage linear programs.
The proposed algorithm can be stopped in finite-time to return a solution endowed with a probabilistic guarantee on quality.
arXiv Detail & Related papers (2020-12-07T14:58:16Z) - A Simulated Annealing Algorithm for Joint Stratification and Sample
Allocation Designs [0.0]
This study combines simulated annealing with delta evaluation to solve the joint stratification and sample allocation problem.
In this problem, atomic strata are partitioned into mutually exclusive and collectively exhaustive strata.
The Bell number of possible solutions is enormous, for even a moderate number of atomic strata, and an additional layer of complexity is added with the evaluation time of each solution.
arXiv Detail & Related papers (2020-11-25T20:27:49Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics
with Simulated Annealing [23.87373187143897]
Deep learning applications require global optimization of non- objective functions, which have multiple local minima.
We show that a coordinate learning algorithm can be used to resolve the same problem in physical simulations.
arXiv Detail & Related papers (2020-05-29T14:44:24Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.