A Neural Network Based on First Principles
- URL: http://arxiv.org/abs/2002.07469v1
- Date: Tue, 18 Feb 2020 10:16:59 GMT
- Title: A Neural Network Based on First Principles
- Authors: Paul M Baggenstoss
- Abstract summary: A Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation.
The approach appeals to the principle of Maximum Entropy (MaxEnt) to find the posterior distribution of the input data of each layer.
- Score: 13.554038901140949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, a Neural network is derived from first principles, assuming
only that each layer begins with a linear dimension-reducing transformation.
The approach appeals to the principle of Maximum Entropy (MaxEnt) to find the
posterior distribution of the input data of each layer, conditioned on the
layer output variables. This posterior has a well-defined mean, the conditional
mean estimator, that is calculated using a type of neural network with
theoretically-derived activation functions similar to sigmoid, softplus, and
relu. This implicitly provides a theoretical justification for their use. A
theorem that finds the conditional distribution and conditional mean estimator
under the MaxEnt prior is proposed, unifying results for special cases.
Combining layers results in an auto-encoder with conventional feed-forward
analysis network and a type of linear Bayesian belief network in the
reconstruction path.
Related papers
- Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Towards Training Without Depth Limits: Batch Normalization Without
Gradient Explosion [83.90492831583997]
We show that a batch-normalized network can keep the optimal signal propagation properties, but avoid exploding gradients in depth.
We use a Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded depth.
We also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
arXiv Detail & Related papers (2023-10-03T12:35:02Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - A Local Geometric Interpretation of Feature Extraction in Deep
Feedforward Neural Networks [13.159994710917022]
In this paper, we present a local geometric analysis to interpret how deep feedforward neural networks extract low-dimensional features from high-dimensional data.
Our study shows that, in a local geometric region, the optimal weight in one layer of the neural network and the optimal feature generated by the previous layer comprise a low-rank approximation of a matrix that is determined by the Bayes action of this layer.
arXiv Detail & Related papers (2022-02-09T18:50:00Z) - Critical Initialization of Wide and Deep Neural Networks through Partial
Jacobians: General Theory and Applications [6.579523168465526]
We introduce emphpartial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0leq l$.
We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.
arXiv Detail & Related papers (2021-11-23T20:31:42Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks.
We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$.
Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - On Approximation Capabilities of ReLU Activation and Softmax Output
Layer in Neural Networks [6.852561400929072]
We prove that a sufficiently large neural network using the ReLU activation function can approximate any function in $L1$ up to any arbitrary precision.
We also show that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in $L1$.
arXiv Detail & Related papers (2020-02-10T19:48:47Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.