Correlation between entropy and generalizability in a neural network
- URL: http://arxiv.org/abs/2207.01996v1
- Date: Tue, 5 Jul 2022 12:28:13 GMT
- Title: Correlation between entropy and generalizability in a neural network
- Authors: Ge Zhang
- Abstract summary: We use Wang-Landau Mote Carlo algorithm to calculate the entropy at a given test accuracy.
Our results show that entropical forces help generalizability.
- Score: 9.223853439465582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although neural networks can solve very complex machine-learning problems,
the theoretical reason for their generalizability is still not fully
understood. Here we use Wang-Landau Mote Carlo algorithm to calculate the
entropy (logarithm of the volume of a part of the parameter space) at a given
test accuracy, and a given training loss function value or training accuracy.
Our results show that entropical forces help generalizability. Although our
study is on a very simple application of neural networks (a spiral dataset and
a small, fully-connected neural network), our approach should be useful in
explaining the generalizability of more complicated neural networks in future
works.
Related papers
- SGD method for entropy error function with smoothing l0 regularization for neural networks [3.108634881604788]
entropy error function has been widely used in neural networks.
We propose a novel entropy function with smoothing l0 regularization for feed-forward neural networks.
Our work is novel as it enables neural networks to learn effectively, producing more accurate predictions.
arXiv Detail & Related papers (2024-05-28T19:54:26Z) - Verified Neural Compressed Sensing [58.98637799432153]
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task.
We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements.
We show that the complexity of the network can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
arXiv Detail & Related papers (2024-05-07T12:20:12Z) - Points of non-linearity of functions generated by random neural networks [0.0]
We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function.
We compute the expected distribution of the points of non-linearity.
arXiv Detail & Related papers (2023-04-19T17:40:19Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - A Derivation of Feedforward Neural Network Gradients Using Fr\'echet
Calculus [0.0]
We show a derivation of the gradients of feedforward neural networks using Fr'teche calculus.
We show how our analysis generalizes to more general neural network architectures including, but not limited to, convolutional networks.
arXiv Detail & Related papers (2022-09-27T08:14:00Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Robust Generalization of Quadratic Neural Networks via Function
Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution.
We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.