Why Neural Networks Work
- URL: http://arxiv.org/abs/2211.14632v1
- Date: Sat, 26 Nov 2022 18:15:17 GMT
- Title: Why Neural Networks Work
- Authors: Sayandev Mukherjee, Bernardo A. Huberman
- Abstract summary: We argue that many properties of fully-connected feedforward neural networks (FCNNs) are explainable from the analysis of a single pair of operations.
We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature.
- Score: 0.32228025627337864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We argue that many properties of fully-connected feedforward neural networks
(FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the
analysis of a single pair of operations, namely a random projection into a
higher-dimensional space than the input, followed by a sparsification
operation. For convenience, we call this pair of successive operations
expand-and-sparsify following the terminology of Dasgupta. We show how
expand-and-sparsify can explain the observed phenomena that have been discussed
in the literature, such as the so-called Lottery Ticket Hypothesis, the
surprisingly good performance of randomly-initialized untrained neural
networks, the efficacy of Dropout in training and most importantly, the
mysterious generalization ability of overparameterized models, first
highlighted by Zhang et al. and subsequently identified even in non-neural
network models by Belkin et al.
Related papers
- Polynomially Over-Parameterized Convolutional Neural Networks Contain
Structured Strong Winning Lottery Tickets [4.020829863982153]
We prove the existence of structured Neuralworks that can approximate any sufficiently smaller network.
This result provides the first sub-exponential bound around the Strong Lottery Ticket Hypothesis.
arXiv Detail & Related papers (2023-11-16T12:38:45Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - On the limits of neural network explainability via descrambling [2.5554069583567487]
We show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights.
We show that in typical deep learning contexts these descramblers take diverse and interesting forms.
arXiv Detail & Related papers (2023-01-18T23:16:53Z) - Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency.
Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z) - PAC-Bayesian Learning of Aggregated Binary Activated Neural Networks
with Probabilities over Representations [2.047424180164312]
We study the expectation of a probabilistic neural network as a predictor by itself, focusing on the aggregation of binary activated neural networks with normal distributions over real-valued weights.
We show that the exact computation remains tractable for deep but narrow neural networks, thanks to a dynamic programming approach.
arXiv Detail & Related papers (2021-10-28T14:11:07Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Learning and Generalization in RNNs [11.107204912245841]
We prove that simple recurrent neural networks can learn functions of sequences.
New ideas enable us to extract information from the hidden state of the RNN in our proofs.
arXiv Detail & Related papers (2021-05-31T18:27:51Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Phase Detection with Neural Networks: Interpreting the Black Box [58.720142291102135]
Neural networks (NNs) usually hinder any insight into the reasoning behind their predictions.
We demonstrate how influence functions can unravel the black box of NN when trained to predict the phases of the one-dimensional extended spinless Fermi-Hubbard model at half-filling.
arXiv Detail & Related papers (2020-04-09T17:45:45Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.