A Local Geometric Interpretation of Feature Extraction in Deep
Feedforward Neural Networks
- URL: http://arxiv.org/abs/2202.04632v2
- Date: Thu, 10 Feb 2022 18:57:31 GMT
- Title: A Local Geometric Interpretation of Feature Extraction in Deep
Feedforward Neural Networks
- Authors: Md Kamran Chowdhury Shisher, Tasmeen Zaman Ornee, and Yin Sun
- Abstract summary: In this paper, we present a local geometric analysis to interpret how deep feedforward neural networks extract low-dimensional features from high-dimensional data.
Our study shows that, in a local geometric region, the optimal weight in one layer of the neural network and the optimal feature generated by the previous layer comprise a low-rank approximation of a matrix that is determined by the Bayes action of this layer.
- Score: 13.159994710917022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a local geometric analysis to interpret how deep
feedforward neural networks extract low-dimensional features from
high-dimensional data. Our study shows that, in a local geometric region, the
optimal weight in one layer of the neural network and the optimal feature
generated by the previous layer comprise a low-rank approximation of a matrix
that is determined by the Bayes action of this layer. This result holds (i) for
analyzing both the output layer and the hidden layers of the neural network,
and (ii) for neuron activation functions with non-vanishing gradients. We use
two supervised learning problems to illustrate our results: neural network
based maximum likelihood classification (i.e., softmax regression) and neural
network based minimum mean square estimation. Experimental validation of these
theoretical results will be conducted in our future work.
Related papers
- Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - Applicability of Random Matrix Theory in Deep Learning [0.966840768820136]
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks.
Our results shed new light on the applicability of Random Matrix Theory to modelling neural networks.
We propose a novel model for the true loss surfaces of neural networks.
arXiv Detail & Related papers (2021-02-12T19:49:19Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Theoretical Analysis of the Advantage of Deepening Neural Networks [0.0]
It is important to know the expressivity of functions computable by deep neural networks.
By the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.
arXiv Detail & Related papers (2020-09-24T04:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.