Related papers: Why Neural Networks Work

Why Neural Networks Work

URL: http://arxiv.org/abs/2211.14632v1
Date: Sat, 26 Nov 2022 18:15:17 GMT
Title: Why Neural Networks Work
Authors: Sayandev Mukherjee, Bernardo A. Huberman
Abstract summary: We argue that many properties of fully-connected feedforward neural networks (FCNNs) are explainable from the analysis of a single pair of operations. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature.
Score: 0.32228025627337864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.

Related papers

Explosive neural networks via higher-order interactions in curved statistical manifolds [43.496401697112695]
We introduce curved neural networks as a class of prototypical models with a limited number of parameters. We show that these curved neural networks implement a self-regulating process that can accelerate memory retrieval. We analytically explore their memory-retrieval capacity using the replica trick near ferromagnetic and spin-glass phase boundaries.
arXiv Detail & Related papers (2024-08-05T09:10:29Z)
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets [4.020829863982153]
We prove the existence of structured Neuralworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the Strong Lottery Ticket Hypothesis.
arXiv Detail & Related papers (2023-11-16T12:38:45Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
On the limits of neural network explainability via descrambling [2.5554069583567487]
We show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights. We show that in typical deep learning contexts these descramblers take diverse and interesting forms.
arXiv Detail & Related papers (2023-01-18T23:16:53Z)
Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency. Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z)
PAC-Bayesian Learning of Aggregated Binary Activated Neural Networks with Probabilities over Representations [2.047424180164312]
We study the expectation of a probabilistic neural network as a predictor by itself, focusing on the aggregation of binary activated neural networks with normal distributions over real-valued weights. We show that the exact computation remains tractable for deep but narrow neural networks, thanks to a dynamic programming approach.
arXiv Detail & Related papers (2021-10-28T14:11:07Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Learning and Generalization in RNNs [11.107204912245841]
We prove that simple recurrent neural networks can learn functions of sequences. New ideas enable us to extract information from the hidden state of the RNN in our proofs.
arXiv Detail & Related papers (2021-05-31T18:27:51Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
Phase Detection with Neural Networks: Interpreting the Black Box [58.720142291102135]
Neural networks (NNs) usually hinder any insight into the reasoning behind their predictions. We demonstrate how influence functions can unravel the black box of NN when trained to predict the phases of the one-dimensional extended spinless Fermi-Hubbard model at half-filling.
arXiv Detail & Related papers (2020-04-09T17:45:45Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.