Exploring Landscapes for Better Minima along Valleys
- URL: http://arxiv.org/abs/2510.27153v1
- Date: Fri, 31 Oct 2025 03:53:49 GMT
- Title: Exploring Landscapes for Better Minima along Valleys
- Authors: Tong Zhao, Jiacheng Li, Yuanchang Zhou, Guangming Tan, Weile Jia,
- Abstract summary: We propose an adaptor "E" for gradient-based generalizations.<n>We show that adapted Lamb, ALTO, increases the test accuracy (generalization) of the current state-of-the-art by an average of 2.5% across a variety of large-batch training tasks.
- Score: 12.577989304074288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding lower and better-generalizing minima is crucial for deep learning. However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the complex geometric properties of the loss landscape, it is difficult to guarantee that such a point is the lowest or provides the best generalization. To address this, we propose an adaptor "E" for gradient-based optimizers. The adapted optimizer tends to continue exploring along landscape valleys (areas with low and nearly identical losses) in order to search for potentially better local minima even after reaching a local minimum. This approach increases the likelihood of finding a lower and flatter local minimum, which is often associated with better generalization. We also provide a proof of convergence for the adapted optimizers in both convex and non-convex scenarios for completeness. Finally, we demonstrate their effectiveness in an important but notoriously difficult training scenario, large-batch training, where Lamb is the benchmark optimizer. Our testing results show that the adapted Lamb, ALTO, increases the test accuracy (generalization) of the current state-of-the-art optimizer by an average of 2.5% across a variety of large-batch training tasks. This work potentially opens a new research direction in the design of optimization algorithms.
Related papers
- Zeroth-Order Optimization Finds Flat Minima [51.41529512093436]
We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian.<n>We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions.
arXiv Detail & Related papers (2025-06-05T17:59:09Z) - Localized Zeroth-Order Prompt Optimization [54.964765668688806]
We propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO)
ZOPO incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization.
Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency.
arXiv Detail & Related papers (2024-03-05T14:18:15Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Learning Regions of Interest for Bayesian Optimization with Adaptive
Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest.
We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z) - Using Differential Evolution to avoid local minima in Variational
Quantum Algorithms [0.0]
Variational Quantum Algorithms (VQAs) are among the most promising NISQ-era algorithms for harnessing quantum computing.
Our goal in this paper is to study alternative optimization methods that can avoid or reduce the effect of local minima and barren plateau problems.
arXiv Detail & Related papers (2023-03-21T20:31:06Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - Questions for Flat-Minima Optimization of Modern Neural Networks [28.12506392321345]
Two methods for finding flat minima stand out: 1. Averaging methods (i.e. Weight Averaging, SWA) and 2. Minimax methods (i.e. Aware, Sharpness Minimization, SAM)
We investigate the loss surfaces from a systematic benchmarking of these approaches across computer vision, natural language processing, and graph learning tasks.
arXiv Detail & Related papers (2022-02-01T18:56:15Z) - Why Do Local Methods Solve Nonconvex Problems? [54.284687261929115]
Non-used optimization is ubiquitous in modern machine learning.
We rigorously formalize it for instances of machine learning problems.
We hypothesize a unified explanation for this phenomenon.
arXiv Detail & Related papers (2021-03-24T19:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.