Reducing Neural Network Parameter Initialization Into an SMT Problem
- URL: http://arxiv.org/abs/2011.01191v3
- Date: Mon, 9 Nov 2020 06:28:16 GMT
- Title: Reducing Neural Network Parameter Initialization Into an SMT Problem
- Authors: Mohamad H. Danesh
- Abstract summary: Training a neural network (NN) depends on multiple factors, including but not limited to the initial weights.
In this paper, we focus on initializing deep NN parameters such that it performs better, comparing to random or zero initialization.
- Score: 8.75682288556859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training a neural network (NN) depends on multiple factors, including but not
limited to the initial weights. In this paper, we focus on initializing deep NN
parameters such that it performs better, comparing to random or zero
initialization. We do this by reducing the process of initialization into an
SMT solver. Previous works consider certain activation functions on small NNs,
however the studied NN is a deep network with different activation functions.
Our experiments show that the proposed approach for parameter initialization
achieves better performance comparing to randomly initialized networks.
Related papers
- Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data.
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - A Weight Initialization Based on the Linear Product Structure for Neural
Networks [0.0]
We study neural networks from a nonlinear point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks.
The proposed strategy is derived from the approximation of activation functions by using theories of numerical algebra to guarantee to find all the local minima.
arXiv Detail & Related papers (2021-09-01T00:18:59Z) - Tensor-based framework for training flexible neural networks [9.176056742068813]
We propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem.
The proposed algorithm can handle different bases decomposition.
The goal of this method is to compress large pretrained NN models, by replacing tensorworks, em i.e., one or multiple layers of the original network, by a new flexible layer.
arXiv Detail & Related papers (2021-06-25T10:26:48Z) - Towards Understanding the Condensation of Two-layer Neural Networks at
Initial Training [1.1958610985612828]
We show that the singularity of the activation function at original point is a key factor to understanding the condensation at initial training stage.
Our experiments suggest that the maximal number of condensed orientations is twice of the singularity order.
arXiv Detail & Related papers (2021-05-25T05:47:55Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.