Why Do Local Methods Solve Nonconvex Problems?
- URL: http://arxiv.org/abs/2103.13462v1
- Date: Wed, 24 Mar 2021 19:34:11 GMT
- Title: Why Do Local Methods Solve Nonconvex Problems?
- Authors: Tengyu Ma
- Abstract summary: Non-used optimization is ubiquitous in modern machine learning.
We rigorously formalize it for instances of machine learning problems.
We hypothesize a unified explanation for this phenomenon.
- Score: 54.284687261929115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-convex optimization is ubiquitous in modern machine learning. Researchers
devise non-convex objective functions and optimize them using off-the-shelf
optimizers such as stochastic gradient descent and its variants, which leverage
the local geometry and update iteratively. Even though solving non-convex
functions is NP-hard in the worst case, the optimization quality in practice is
often not an issue -- optimizers are largely believed to find approximate
global minima. Researchers hypothesize a unified explanation for this
intriguing phenomenon: most of the local minima of the practically-used
objectives are approximately global minima. We rigorously formalize it for
concrete instances of machine learning problems.
Related papers
- Super Gradient Descent: Global Optimization requires Global Gradient [0.0]
This article introduces a novel optimization method that guarantees convergence to the global minimum for any k-Lipschitz function defined on a closed interval.
Our approach addresses the limitations of traditional optimization algorithms, which often get trapped in local minima.
arXiv Detail & Related papers (2024-10-25T17:28:39Z) - Review Non-convex Optimization Method for Machine Learning [0.0]
Non-local optimization is a critical tool in advancing machine learning, especially for complex models like deep neural networks and saddle machines.
This paper examines methods in non computation and applications of non-local optimization in machine learning.
arXiv Detail & Related papers (2024-10-02T20:34:33Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Fighting the curse of dimensionality: A machine learning approach to
finding global optima [77.34726150561087]
This paper shows how to find global optima in structural optimization problems.
By exploiting certain cost functions we either obtain the global at best or obtain superior results at worst when compared to established optimization procedures.
arXiv Detail & Related papers (2021-10-28T09:50:29Z) - Combining resampling and reweighting for faithful stochastic
optimization [1.52292571922932]
When the loss function is a sum of multiple terms, a popular method is gradient descent.
We show that the difference in the Lipschitz constants of multiple terms in the loss function causes gradient descent to different variances at different minimums.
arXiv Detail & Related papers (2021-05-31T04:21:25Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Community detection using fast low-cardinality semidefinite programming [94.4878715085334]
We propose a new low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from Leiden-k-cut.
This proposed algorithm is scalable, outperforms state-of-the-art algorithms, and outperforms in real-world time with little additional cost.
arXiv Detail & Related papers (2020-12-04T15:46:30Z) - Newton-type Methods for Minimax Optimization [37.58722381375258]
We propose two novel Newtontype algorithms for non-nonconcave minimax learning.
We prove their convergence at strict minimax points, which are sequentials solutions.
arXiv Detail & Related papers (2020-06-25T17:38:00Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.