In almost all shallow analytic neural network optimization landscapes, efficient minimizers have strongly convex neighborhoods
- URL: http://arxiv.org/abs/2504.08867v1
- Date: Fri, 11 Apr 2025 10:43:17 GMT
- Title: In almost all shallow analytic neural network optimization landscapes, efficient minimizers have strongly convex neighborhoods
- Authors: Felix Benning, Steffen Dereich,
- Abstract summary: We rigorously analyze the prevalence of this property for the mean squared error induced by shallow, 1-hidden layer neural networks.<n>We show that for certain randomly picked regression problems the optimization landscape is almost surely a Morse function on the efficient domain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Whether or not a local minimum of a cost function has a strongly convex neighborhood greatly influences the asymptotic convergence rate of optimizers. In this article, we rigorously analyze the prevalence of this property for the mean squared error induced by shallow, 1-hidden layer neural networks with analytic activation functions when applied to regression problems. The parameter space is divided into two domains: the 'efficient domain' (all parameters for which the respective realization function cannot be generated by a network having a smaller number of neurons) and the 'redundant domain' (the remaining parameters). In almost all regression problems on the efficient domain the optimization landscape only features local minima that are strongly convex. Formally, we will show that for certain randomly picked regression problems the optimization landscape is almost surely a Morse function on the efficient domain. The redundant domain has significantly smaller dimension than the efficient domain and on this domain, potential local minima are never isolated.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time [45.72323731094864]
In this paper, we study the optimality gap between two-layer ReLULU networks regularized with weight decay and their convex relaxations.
Our study sheds new light on understanding why local methods work well.
arXiv Detail & Related papers (2024-02-06T01:29:35Z) - Does a sparse ReLU network training problem always admit an optimum? [0.0]
We show that the existence of an optimal solution is not always guaranteed, especially in the context of sparse ReLU neural networks.
In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters.
arXiv Detail & Related papers (2023-06-05T08:01:50Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Noisy Low-rank Matrix Optimization: Geometry of Local Minima and
Convergence Rate [14.191310794366075]
This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning.
We develop a framework that can deal with random corruptions to general objective functions, where the noise model is arbitrary.
We characterize the geometry of the spurious local minima of the problem in a local region around ground truth in the case when the RIP constant is greater than $1/3$.
arXiv Detail & Related papers (2022-03-08T07:44:47Z) - On the Omnipresence of Spurious Local Minima in Certain Neural Network
Training Problems [0.0]
We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output.
It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine.
arXiv Detail & Related papers (2022-02-23T14:41:54Z) - Adaptive neural domain refinement for solving time-dependent
differential equations [0.0]
A classic approach for solving differential equations with neural networks builds upon neural forms, which employ the differential equation with a discretisation of the solution domain.
It would be desirable to transfer such important and successful strategies to the field of neural network based solutions.
We propose a novel adaptive neural approach to meet this aim for solving time-dependent problems.
arXiv Detail & Related papers (2021-12-23T13:19:07Z) - Regret Bounds for Gaussian-Process Optimization in Large Domains [40.92207267407271]
We provide upper bounds on the suboptimality (Bayesian simple regret) of the solution found by optimization strategies.
These regret bounds illuminate the relationship between the number of evaluations, the domain size, and the optimality of the retrieved function value.
In particular, they show that even when the number of evaluations is far too small to find the global optimum, we can find nontrivial function values.
arXiv Detail & Related papers (2021-04-29T05:19:03Z) - Why Do Local Methods Solve Nonconvex Problems? [54.284687261929115]
Non-used optimization is ubiquitous in modern machine learning.
We rigorously formalize it for instances of machine learning problems.
We hypothesize a unified explanation for this phenomenon.
arXiv Detail & Related papers (2021-03-24T19:34:11Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - The Effects of Mild Over-parameterization on the Optimization Landscape
of Shallow ReLU Neural Networks [36.35321290763711]
We prove that the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons.
For the non-global minima, we prove that adding even just a single neuron will turn a non-global minimum into a saddle point.
arXiv Detail & Related papers (2020-06-01T15:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.