Complex Critical Points of Deep Linear Neural Networks
- URL: http://arxiv.org/abs/2301.12651v1
- Date: Mon, 30 Jan 2023 04:16:49 GMT
- Title: Complex Critical Points of Deep Linear Neural Networks
- Authors: Ayush Bharadwaj and Serkan Ho\c{s}ten
- Abstract summary: For networks with a single hidden layer trained on a single data point we give an improved bound on the number of complex critical points of the loss function.
We show that for any number of hidden layers complex critical points with zero coordinates arise in certain patterns which we completely classify for networks with one hidden layer.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We extend the work of Mehta, Chen, Tang, and Hauenstein on computing the
complex critical points of the loss function of deep linear neutral networks
when the activation function is the identity function. For networks with a
single hidden layer trained on a single data point we give an improved bound on
the number of complex critical points of the loss function. We show that for
any number of hidden layers complex critical points with zero coordinates arise
in certain patterns which we completely classify for networks with one hidden
layer. We report our results of computational experiments with varying network
architectures defining small deep linear networks using
HomotopyContinuation.jl.
Related papers
- Function Space and Critical Points of Linear Convolutional Networks [4.483341215742946]
We study the geometry of linear networks with one-dimensional convolutional layers.
We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points.
arXiv Detail & Related papers (2023-04-12T10:15:17Z) - Global Convergence Analysis of Deep Linear Networks with A One-neuron
Layer [18.06634056613645]
We consider optimizing deep linear networks which have a layer with one neuron under quadratic loss.
We describe the convergent point of trajectories with arbitrary starting point under flow.
We show specific convergence rates of trajectories that converge to the global gradientr by stages.
arXiv Detail & Related papers (2022-01-08T04:44:59Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Deep Networks Provably Classify Data on Curves [12.309532551321334]
We study a model problem that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere.
We prove that when (i) the network depth is large to certain properties that set the difficulty of the problem and (ii) the network width and number of samples is intrinsic in the relative depth, randomly-d gradient descent quickly learns to correctly classify all points on the two curves with high probability.
arXiv Detail & Related papers (2021-07-29T20:40:04Z) - Learning distinct features helps, provably [98.78384185493624]
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss.
We measure the diversity by the average $L$-distance between the hidden-layer features.
arXiv Detail & Related papers (2021-06-10T19:14:45Z) - Landscape analysis for shallow ReLU neural networks: complete
classification of critical points for affine target functions [3.9103337761169947]
We provide a complete classification of the critical points in the case where the target function is affine.
Our approach builds on a careful analysis of the different types of hidden neurons that can occur in a ReLU neural network.
arXiv Detail & Related papers (2021-03-19T17:35:01Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Deep Positional and Relational Feature Learning for Rotation-Invariant
Point Cloud Analysis [107.9979381402172]
We propose a rotation-invariant deep network for point clouds analysis.
The network is hierarchical and relies on two modules: a positional feature embedding block and a relational feature embedding block.
Experiments show state-of-the-art classification and segmentation performances on benchmark datasets.
arXiv Detail & Related papers (2020-11-18T04:16:51Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Piecewise linear activations substantially shape the loss surfaces of
neural networks [95.73230376153872]
This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks.
We first prove that it the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima.
For one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
arXiv Detail & Related papers (2020-03-27T04:59:34Z) - Ill-Posedness and Optimization Geometry for Nonlinear Neural Network
Training [4.7210697296108926]
We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape.
For shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima.
We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points.
arXiv Detail & Related papers (2020-02-07T16:33:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.