Topological obstructions in neural networks learning
- URL: http://arxiv.org/abs/2012.15834v1
- Date: Thu, 31 Dec 2020 18:53:25 GMT
- Title: Topological obstructions in neural networks learning
- Authors: Serguei Barannikov, Grigorii Sotnikov, Ilya Trofimov, Alexander
Korotin, Evgeny Burnaev
- Abstract summary: We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
- Score: 67.8848058842671
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We apply methods of topological data analysis to loss functions to gain
insights on learning of deep neural networks and their generalization
properties. We study global properties of the loss function gradient flow. We
use topological data analysis of the loss function and its Morse complex to
relate local behavior along gradient trajectories with global properties of the
loss surface. We define neural network Topological Obstructions score,
TO-score, with help of robust topological invariants, barcodes of loss
function, that quantify the badness of local minima for gradient-based
optimization. We have made several experiments for computing these invariants,
for small neural networks, and for fully connected, convolutional and
ResNet-like neural networks on different datasets: MNIST, Fashion MNIST,
CIFAR10, SVHN. Our two principal observations are as follows. Firstly, the
neural network barcode and TO-score decrease with the increase of the neural
network depth and width. Secondly, there is an intriguing connection between
the length of minima segments in the barcode and the minima generalization
error.
Related papers
- Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss [2.07180164747172]
We compare deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers.
We find that a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs.
arXiv Detail & Related papers (2024-01-31T20:10:10Z) - A topological description of loss surfaces based on Betti Numbers [8.539445673580252]
We provide a topological measure to evaluate loss complexity in the case of multilayer neural networks.
We find that certain variations in the loss function or model architecture, such as adding an $ell$ regularization term or skip connections in a feedforward network, do not affect loss in specific cases.
arXiv Detail & Related papers (2024-01-08T11:20:04Z) - Topological Expressivity of ReLU Neural Networks [0.0]
We study the expressivity of ReLU neural networks in the setting of a binary classification problem from a topological perspective.
Results show that deep ReLU neural networks are exponentially more powerful than shallow ones in terms of topological simplification.
arXiv Detail & Related papers (2023-10-17T10:28:00Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Critical Investigation of Failure Modes in Physics-informed Neural
Networks [0.9137554315375919]
We show that a physics-informed neural network with a composite formulation produces highly non- learned loss surfaces that are difficult to optimize.
We also assess the training both approaches on two elliptic problems with increasingly complex target solutions.
arXiv Detail & Related papers (2022-06-20T18:43:35Z) - A Local Geometric Interpretation of Feature Extraction in Deep
Feedforward Neural Networks [13.159994710917022]
In this paper, we present a local geometric analysis to interpret how deep feedforward neural networks extract low-dimensional features from high-dimensional data.
Our study shows that, in a local geometric region, the optimal weight in one layer of the neural network and the optimal feature generated by the previous layer comprise a low-rank approximation of a matrix that is determined by the Bayes action of this layer.
arXiv Detail & Related papers (2022-02-09T18:50:00Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.