Related papers: Linear discriminant initialization for feed-forward neural networks

Linear discriminant initialization for feed-forward neural networks

URL: http://arxiv.org/abs/2007.12782v2
Date: Tue, 18 Aug 2020 17:12:23 GMT
Title: Linear discriminant initialization for feed-forward neural networks
Authors: Marissa Masden, Dev Sinha
Abstract summary: We initialize the first layer of a neural network using the linear discriminants which distinguish the best individual classes. Networks in this way take fewer training steps to reach the same level of training.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Informed by the basic geometry underlying feed forward neural networks, we initialize the weights of the first layer of a neural network using the linear discriminants which best distinguish individual classes. Networks initialized in this way take fewer training steps to reach the same level of training, and asymptotically have higher accuracy on training data.

Related papers

Shrinkage Initialization for Smooth Learning of Neural Networks [14.176838845627744]
An improved approach to neural learning is presented, which adopts the shrinkage approach to initialize the transformation of each layer of networks. It can be universally adapted for the structures of any networks with random layers, while stable performance can be attained.
arXiv Detail & Related papers (2025-04-12T07:11:35Z)
Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy [0.0]
We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks (DNNs) We show that a single epoch of training is sufficient to predict the trainability of the deep feedforward network on a range of datasets.
arXiv Detail & Related papers (2024-06-13T18:00:05Z)
Early alignment in two-layer networks training is a two-edged sword [24.43739371803548]
Training neural networks with first order optimisation methods is at the core of the empirical success of deep learning. Small initialisations are generally associated to a feature learning regime, for which gradient descent is implicitly biased towards simple solutions. This work provides a general and quantitative description of the early alignment phase, originally introduced by Maennel et al.
arXiv Detail & Related papers (2024-01-19T16:23:53Z)
Fundamental limits of overparametrized shallow neural networks for supervised learning [11.136777922498355]
We study a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error.
arXiv Detail & Related papers (2023-07-11T08:30:50Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two. For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
An Effective and Efficient Initialization Scheme for Training Multi-layer Feedforward Neural Networks [5.161531917413708]
We propose a novel network initialization scheme based on the celebrated Stein's identity. A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
arXiv Detail & Related papers (2020-05-16T16:17:37Z)
Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width [116.69845849754186]
Taylorized training involves training the $k$-th order Taylor expansion of the neural network. We show that Taylorized training agrees with full neural network training increasingly better as we increase $k$. We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.
arXiv Detail & Related papers (2020-02-10T18:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.