Probabilistic bounds on neuron death in deep rectifier networks
- URL: http://arxiv.org/abs/2007.06192v2
- Date: Thu, 10 Jun 2021 20:54:09 GMT
- Title: Probabilistic bounds on neuron death in deep rectifier networks
- Authors: Blaine Rister and Daniel L. Rubin
- Abstract summary: Neuron death is a complex phenomenon with implications for model trainability.
In this work, we derive both upper and lower bounds on the probability that a ReLU network is to a trainable point.
We show that it is possible to increase the depth of a network indefinitely, so long as the width increases as well.
- Score: 6.167486561517023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neuron death is a complex phenomenon with implications for model
trainability: the deeper the network, the lower the probability of finding a
valid initialization. In this work, we derive both upper and lower bounds on
the probability that a ReLU network is initialized to a trainable point, as a
function of model hyperparameters. We show that it is possible to increase the
depth of a network indefinitely, so long as the width increases as well.
Furthermore, our bounds are asymptotically tight under reasonable assumptions:
first, the upper bound coincides with the true probability for a single-layer
network with the largest possible input set. Second, the true probability
converges to our lower bound as the input set shrinks to a single point, or as
the network complexity grows under an assumption about the output variance. We
confirm these results by numerical simulation, showing rapid convergence to the
lower bound with increasing network depth. Then, motivated by the theory, we
propose a practical sign flipping scheme which guarantees that the ratio of
living data points in a $k$-layer network is at least $2^{-k}$. Finally, we
show how these issues are mitigated by network design features currently seen
in practice, such as batch normalization, residual connections, dense networks
and skip connections. This suggests that neuron death may provide insight into
the efficacy of various model architectures.
Related papers
- Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective.
We show how to compute this efficiently for tractable circuits.
We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z) - Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization [5.678271181959529]
We study the evolution of the angle between two inputs to a ReLU neural network as a function of the number of layers.
We validate our theoretical results with Monte Carlo experiments and show that our results accurately approximate finite network behaviour.
We also empirically investigate how the depth degeneracy phenomenon can negatively impact training of real networks.
arXiv Detail & Related papers (2023-02-20T01:30:27Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - It's Hard for Neural Networks To Learn the Game of Life [4.061135251278187]
Recent findings suggest that neural networks rely on lucky random initial weights of "lottery tickets" that converge quickly to a solution.
We examine small convolutional networks that are trained to predict n steps of the two-dimensional cellular automaton Conway's Game of Life.
We find that networks of this architecture trained on this task rarely converge.
arXiv Detail & Related papers (2020-09-03T00:47:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.