Related papers: Complex Critical Points of Deep Linear Neural Networks

Complex Critical Points of Deep Linear Neural Networks

URL: http://arxiv.org/abs/2301.12651v1
Date: Mon, 30 Jan 2023 04:16:49 GMT
Title: Complex Critical Points of Deep Linear Neural Networks
Authors: Ayush Bharadwaj and Serkan Ho\c{s}ten
Abstract summary: For networks with a single hidden layer trained on a single data point we give an improved bound on the number of complex critical points of the loss function. We show that for any number of hidden layers complex critical points with zero coordinates arise in certain patterns which we completely classify for networks with one hidden layer.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We extend the work of Mehta, Chen, Tang, and Hauenstein on computing the complex critical points of the loss function of deep linear neutral networks when the activation function is the identity function. For networks with a single hidden layer trained on a single data point we give an improved bound on the number of complex critical points of the loss function. We show that for any number of hidden layers complex critical points with zero coordinates arise in certain patterns which we completely classify for networks with one hidden layer. We report our results of computational experiments with varying network architectures defining small deep linear networks using HomotopyContinuation.jl.

Related papers

Function Space and Critical Points of Linear Convolutional Networks [4.483341215742946]
We study the geometry of linear networks with one-dimensional convolutional layers. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points.
arXiv Detail & Related papers (2023-04-12T10:15:17Z)
Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer [18.06634056613645]
We consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under flow. We show specific convergence rates of trajectories that converge to the global gradientr by stages.
arXiv Detail & Related papers (2022-01-08T04:44:59Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Deep Networks Provably Classify Data on Curves [12.309532551321334]
We study a model problem that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. We prove that when (i) the network depth is large to certain properties that set the difficulty of the problem and (ii) the network width and number of samples is intrinsic in the relative depth, randomly-d gradient descent quickly learns to correctly classify all points on the two curves with high probability.
arXiv Detail & Related papers (2021-07-29T20:40:04Z)
Learning distinct features helps, provably [98.78384185493624]
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss. We measure the diversity by the average $L$-distance between the hidden-layer features.
arXiv Detail & Related papers (2021-06-10T19:14:45Z)
Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions [3.9103337761169947]
We provide a complete classification of the critical points in the case where the target function is affine. Our approach builds on a careful analysis of the different types of hidden neurons that can occur in a ReLU neural network.
arXiv Detail & Related papers (2021-03-19T17:35:01Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis [107.9979381402172]
We propose a rotation-invariant deep network for point clouds analysis. The network is hierarchical and relies on two modules: a positional feature embedding block and a relational feature embedding block. Experiments show state-of-the-art classification and segmentation performances on benchmark datasets.
arXiv Detail & Related papers (2020-11-18T04:16:51Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Piecewise linear activations substantially shape the loss surfaces of neural networks [95.73230376153872]
This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that it the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima. For one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.
arXiv Detail & Related papers (2020-03-27T04:59:34Z)
Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training [4.7210697296108926]
We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. For shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points.
arXiv Detail & Related papers (2020-02-07T16:33:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.