Related papers: Global Optimization via Softmin Energy Minimization

Global Optimization via Softmin Energy Minimization

URL: http://arxiv.org/abs/2509.17815v1
Date: Mon, 22 Sep 2025 14:09:19 GMT
Title: Global Optimization via Softmin Energy Minimization
Authors: Andrea Agazzi, Vittorio Carlei, Marco Romito, Samuele Saviozzi,
Abstract summary: This paper introduces a novel gradient-based particle optimization method designed to efficiently escape interacting local minima and locate global optima.<n>We show that our method facilitates faster transitions between local minima by reducing effective potential with respect to Simulated Annealing.
Score: 3.966519779235704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Global optimization, particularly for non-convex functions with multiple local minima, poses significant challenges for traditional gradient-based methods. While metaheuristic approaches offer empirical effectiveness, they often lack theoretical convergence guarantees and may disregard available gradient information. This paper introduces a novel gradient-based swarm particle optimization method designed to efficiently escape local minima and locate global optima. Our approach leverages a "Soft-min Energy" interacting function, $J_\beta(\mathbf{x})$, which provides a smooth, differentiable approximation of the minimum function value within a particle swarm. We define a stochastic gradient flow in the particle space, incorporating a Brownian motion term for exploration and a time-dependent parameter $\beta$ to control smoothness, similar to temperature annealing. We theoretically demonstrate that for strongly convex functions, our dynamics converges to a stationary point where at least one particle reaches the global minimum, with other particles exhibiting exploratory behavior. Furthermore, we show that our method facilitates faster transitions between local minima by reducing effective potential barriers with respect to Simulated Annealing. More specifically, we estimate the hitting times of unexplored potential wells for our model in the small noise regime and show that they compare favorably with the ones of overdamped Langevin. Numerical experiments on benchmark functions, including double wells and the Ackley function, validate our theoretical findings and demonstrate better performance over the well-known Simulated Annealing method in terms of escaping local minima and achieving faster convergence.

Related papers

Zeroth-Order Optimization Finds Flat Minima [51.41529512093436]
We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian.<n>We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions.
arXiv Detail & Related papers (2025-06-05T17:59:09Z)
Semi-Implicit Functional Gradient Flow for Efficient Sampling [30.32233517392456]
We propose a functional gradient ParVI method that uses perturbed particles with Gaussian noise as the approximation family.<n>We show that the corresponding functional gradient flow, which can be estimated via denoising score matching with neural networks, exhibits strong theoretical convergence guarantees.<n>In addition, we present an adaptive version of our method that automatically selects the appropriate noise magnitude during sampling.
arXiv Detail & Related papers (2024-10-23T15:00:30Z)
Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Ascent. We show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings. We extend our algorithm to extend the setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings.
arXiv Detail & Related papers (2022-10-31T17:59:29Z)
Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED) MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs) We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion [67.66101533752605]
Langevin diffusion is a powerful method for non- optimization. We propose replica exchange, which swaps Langevin diffusions with different temperatures. By discretizing the replica exchange Langevin diffusion, we obtain a discretetime algorithm.
arXiv Detail & Related papers (2020-07-04T02:52:11Z)
CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing [23.87373187143897]
Deep learning applications require global optimization of non- objective functions, which have multiple local minima. We show that a coordinate learning algorithm can be used to resolve the same problem in physical simulations.
arXiv Detail & Related papers (2020-05-29T14:44:24Z)
Consensus-Based Optimization on Hypersurfaces: Well-Posedness and Mean-Field Limit [7.998311072988401]
We introduce a new differential model for global optimization of non-field functions on compact hypersurfaces. In particular, as soon as the consensus is reached, then the consensus vanishes.
arXiv Detail & Related papers (2020-01-31T18:33:08Z)
Consensus-Based Optimization on the Sphere: Convergence to Global Minimizers and Machine Learning [7.998311072988401]
We investigate the implementation of a new Kuramoto-Vicsek-type model for global optimization of non functions on the sphere. We present several numerical experiments, which show that the algorithm proposed in the present paper scales well with the dimension and is extremely versatile.
arXiv Detail & Related papers (2020-01-31T18:23:26Z)
Replica Exchange for Non-Convex Optimization [4.421561004829125]
Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima. Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and verify weak force. This paper shows that these two algorithms and their non-swapping variants can collaborate" through a simple exchange mechanism.
arXiv Detail & Related papers (2020-01-23T03:13:19Z)
A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs) We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.