Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
Networks
- URL: http://arxiv.org/abs/2107.10370v1
- Date: Wed, 21 Jul 2021 22:05:48 GMT
- Title: Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
Networks
- Authors: Yossi Arjevani, Michael Field
- Abstract summary: We show that the Hessian spectrum concentrates near positives, with the exception of $Theta(d)$ eigenvalues which growly with$d$.
This makes possible the creation and the annihilation of minima using powerful tools from bifurcation theory.
- Score: 15.711517003382484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the optimization problem associated with fitting two-layer ReLU
neural networks with respect to the squared loss, where labels are generated by
a target network. We make use of the rich symmetry structure to develop a novel
set of tools for studying families of spurious minima. In contrast to existing
approaches which operate in limiting regimes, our technique directly addresses
the nonconvex loss landscape for a finite number of inputs $d$ and neurons $k$,
and provides analytic, rather than heuristic, information. In particular, we
derive analytic estimates for the loss at different minima, and prove that
modulo $O(d^{-1/2})$-terms the Hessian spectrum concentrates near small
positive constants, with the exception of $\Theta(d)$ eigenvalues which grow
linearly with~$d$. We further show that the Hessian spectrum at global and
spurious minima coincide to $O(d^{-1/2})$-order, thus challenging our ability
to argue about statistical generalization through local curvature. Lastly, our
technique provides the exact \emph{fractional} dimensionality at which families
of critical points turn from saddles into spurious minima. This makes possible
the study of the creation and the annihilation of spurious minima using
powerful tools from equivariant bifurcation theory.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Information-Theoretic Thresholds for Planted Dense Cycles [52.076657911275525]
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences.
For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of $n$, $tau$, and an edge-wise signal-to-noise ratio $lambda$.
arXiv Detail & Related papers (2024-02-01T03:39:01Z) - Hidden Minima in Two-Layer ReLU Networks [7.23389716633927]
Two types of infinite families of spurious minima, giving one minimum per $d$, were recently found.
The loss at minima belonging to the first type converges to zero as $d$ increases.
In the second type, the loss remains bounded away from zero.
arXiv Detail & Related papers (2023-12-28T04:27:15Z) - Annihilation of Spurious Minima in Two-Layer ReLU Networks [9.695960412426672]
We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss.
We show that adding neurons can turn symmetric spurious minima into saddles.
We also prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function.
arXiv Detail & Related papers (2022-10-12T11:04:21Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale
of Symmetry [9.695960412426672]
We analytically characterize the Hessian at various families of spurious minima.
In particular, we prove that for $dge k$ standard Gaussian inputs: (a) of the $dk$ eigenvalues of the Hessian, $dk - O(d)$ concentrate near zero, (b) $Omega(d)$ of the eigenvalues grow linearly with $k$.
arXiv Detail & Related papers (2020-08-04T20:08:35Z) - Symmetry & critical points for a model shallow neural network [9.695960412426672]
We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons.
We leverage the rich symmetry exhibited by such models to identify various families of critical points.
We show that while the loss function at certain types of spurious minima decays to zero like $k-1$, in other cases the loss converges to a strictly positive constant.
arXiv Detail & Related papers (2020-03-23T23:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.