Topology of deep neural networks
- URL: http://arxiv.org/abs/2004.06093v1
- Date: Mon, 13 Apr 2020 17:53:36 GMT
- Title: Topology of deep neural networks
- Authors: Gregory Naitzat, Andrey Zhitnikov, and Lek-Heng Lim
- Abstract summary: We study how the topology of a data set $M = M_a cup M_b subseteq mathbbRd$ changes as it passes through the layers of a well-trained neural network.
- Score: 8.946655323517092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study how the topology of a data set $M = M_a \cup M_b \subseteq
\mathbb{R}^d$, representing two classes $a$ and $b$ in a binary classification
problem, changes as it passes through the layers of a well-trained neural
network, i.e., with perfect accuracy on training set and near-zero
generalization error ($\approx 0.01\%$). The goal is to shed light on two
mysteries in deep neural networks: (i) a nonsmooth activation function like
ReLU outperforms a smooth one like hyperbolic tangent; (ii) successful neural
network architectures rely on having many layers, even though a shallow network
can approximate any function arbitrary well. We performed extensive experiments
on the persistent homology of a wide range of point cloud data sets, both real
and simulated. The results consistently demonstrate the following: (1) Neural
networks operate by changing topology, transforming a topologically complicated
data set into a topologically simple one as it passes through the layers. No
matter how complicated the topology of $M$ we begin with, when passed through a
well-trained neural network $f : \mathbb{R}^d \to \mathbb{R}^p$, there is a
vast reduction in the Betti numbers of both components $M_a$ and $M_b$; in fact
they nearly always reduce to their lowest possible values:
$\beta_k\bigl(f(M_i)\bigr) = 0$ for $k \ge 1$ and $\beta_0\bigl(f(M_i)\bigr) =
1$, $i =a, b$. Furthermore, (2) the reduction in Betti numbers is significantly
faster for ReLU activation than hyperbolic tangent activation as the former
defines nonhomeomorphic maps that change topology, whereas the latter defines
homeomorphic maps that preserve topology. Lastly, (3) shallow and deep networks
transform data sets differently -- a shallow network operates mainly through
changing geometry and changes topology only in its final layers, a deep one
spreads topological changes more evenly across all layers.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Implicit Hypersurface Approximation Capacity in Deep ReLU Networks [0.0]
We develop a geometric approximation theory for deep feed-forward neural networks with ReLU activations.
We show that a deep fully-connected ReLU network of width $d+1$ can implicitly construct an approximation as its zero contour.
arXiv Detail & Related papers (2024-07-04T11:34:42Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Differential Equation Scaling Limits of Shaped and Unshaped Neural Networks [8.716913598251386]
We find similar differential equation based characterization for two types of unshaped networks.
We derive the first order correction to the layerwise correlation.
These results together provide a connection between shaped and unshaped network architectures.
arXiv Detail & Related papers (2023-10-18T16:15:10Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Theory of Deep Convolutional Neural Networks III: Approximating Radial
Functions [7.943024117353317]
We consider a family of deep neural networks consisting of two groups of convolutional layers, a down operator, and a fully connected layer.
The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer.
arXiv Detail & Related papers (2021-07-02T08:22:12Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - A simple geometric proof for the benefit of depth in ReLU networks [57.815699322370826]
We present a simple proof for the benefit of depth in multi-layer feedforward network with rectified activation ("depth separation")
We present a concrete neural network with linear depth (in $m$) and small constant width ($leq 4$) that classifies the problem with zero error.
arXiv Detail & Related papers (2021-01-18T15:40:27Z) - A Corrective View of Neural Networks: Representation, Memorization and
Learning [26.87238691716307]
We develop a corrective mechanism for neural network approximation.
We show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels.
We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions.
arXiv Detail & Related papers (2020-02-01T20:51:09Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.