An Effective and Efficient Initialization Scheme for Training
Multi-layer Feedforward Neural Networks
- URL: http://arxiv.org/abs/2005.08027v3
- Date: Thu, 25 Jun 2020 12:51:00 GMT
- Title: An Effective and Efficient Initialization Scheme for Training
Multi-layer Feedforward Neural Networks
- Authors: Zebin Yang, Hengtao Zhang, Agus Sudjianto, Aijun Zhang
- Abstract summary: We propose a novel network initialization scheme based on the celebrated Stein's identity.
A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
- Score: 5.161531917413708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network initialization is the first and critical step for training neural
networks. In this paper, we propose a novel network initialization scheme based
on the celebrated Stein's identity. By viewing multi-layer feedforward neural
networks as cascades of multi-index models, the projection weights to the first
hidden layer are initialized using eigenvectors of the cross-moment matrix
between the input's second-order score function and the response. The input
data is then forward propagated to the next layer and such a procedure can be
repeated until all the hidden layers are initialized. Finally, the weights for
the output layer are initialized by generalized linear modeling. Such a
proposed SteinGLM method is shown through extensive numerical results to be
much faster and more accurate than other popular methods commonly used for
training neural networks.
Related papers
- An Initialization Schema for Neuronal Networks on Tabular Data [0.9155684383461983]
We show that a binomial neural network can be used effectively on tabular data.
The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks.
We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches.
arXiv Detail & Related papers (2023-11-07T13:52:35Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Sparse tree-based initialization for neural networks [0.0]
We show that dedicated neural network (NN) architectures can handle specific data types such as CNN for images or RNN for text.
In this work, we propose a new technique for (potentially deep) multilayer perceptrons (MLP)
We show that our new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor.
arXiv Detail & Related papers (2022-09-30T07:44:03Z) - Neural Maximum A Posteriori Estimation on Unpaired Data for Motion
Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data.
The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - A Greedy Algorithm for Quantizing Neural Networks [4.683806391173103]
We propose a new computationally efficient method for quantizing the weights of pre- trained neural networks.
Our method deterministically quantizes layers in an iterative fashion with no complicated re-training required.
arXiv Detail & Related papers (2020-10-29T22:53:10Z) - Linear discriminant initialization for feed-forward neural networks [0.0]
We initialize the first layer of a neural network using the linear discriminants which distinguish the best individual classes.
Networks in this way take fewer training steps to reach the same level of training.
arXiv Detail & Related papers (2020-07-24T21:53:48Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.