Related papers: Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions

Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions

URL: http://arxiv.org/abs/2103.10922v1
Date: Fri, 19 Mar 2021 17:35:01 GMT
Title: Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions
Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek
Abstract summary: We provide a complete classification of the critical points in the case where the target function is affine. Our approach builds on a careful analysis of the different types of hidden neurons that can occur in a ReLU neural network.
Score: 3.9103337761169947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we analyze the landscape of the true loss of a ReLU neural network with one hidden layer. We provide a complete classification of the critical points in the case where the target function is affine. In particular, we prove that local minima and saddle points have to be of a special form and show that there are no local maxima. Our approach is of a combinatorial nature and builds on a careful analysis of the different types of hidden neurons that can occur in a ReLU neural network.

Related papers

Topological obstruction to the training of shallow ReLU neural networks [0.0]
We study the interplay between the geometry of the loss landscape and the optimization trajectories of simple neural networks. This paper reveals the presence of topological obstruction in the loss landscape of shallow ReLU neural networks trained using gradient flow.
arXiv Detail & Related papers (2024-10-18T19:17:48Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer [18.06634056613645]
We consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under flow. We show specific convergence rates of trajectories that converge to the global gradientr by stages.
arXiv Detail & Related papers (2022-01-08T04:44:59Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Non-asymptotic Excess Risk Bounds for Classification with Deep Convolutional Neural Networks [6.051520664893158]
We consider the problem of binary classification with a class of general deep convolutional neural networks. We define the prefactors of the risk bounds in terms of the input data dimension and other model parameters. We show that the classification methods with CNNs can circumvent the curse of dimensionality.
arXiv Detail & Related papers (2021-05-01T15:55:04Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Expressivity of Deep Neural Networks [2.7909470193274593]
In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks. While the mainbody of existing results is for general feedforward architectures, we also depict approximation results for convolutional, residual and recurrent neural networks.
arXiv Detail & Related papers (2020-07-09T13:08:01Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
Piecewise linear activations substantially shape the loss surfaces of neural networks [95.73230376153872]
This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that it the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima. For one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
arXiv Detail & Related papers (2020-03-27T04:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.