Related papers: Stably unactivated neurons in ReLU neural networks

Stably unactivated neurons in ReLU neural networks

URL: http://arxiv.org/abs/2412.06829v2
Date: Tue, 17 Dec 2024 17:28:59 GMT
Title: Stably unactivated neurons in ReLU neural networks
Authors: Natalie Brownlowe, Christopher R. Cornwell, Ethan Montes, Gabriel Quijano, Grace Stulman, Na Zhang,
Abstract summary: In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness.<n>In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated.
Score: 1.347660513756976
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension $n_0$, we prove that if the first hidden layer has $n_0+1$ neurons then this probability is exactly $\frac{2^{n_0}+1}{4^{n_0+1}}$, and if the first hidden layer has $n_1$ neurons, $n_1 \le n_0$, then the probability is $\frac{1}{2^{n_1+1}}$. Finally, for the case when the first hidden layer has more neurons than $n_0+1$, a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.

Related papers

Should Under-parameterized Student Networks Copy or Average Teacher Weights? [7.777410338143785]
We consider the case when $f*$ itself is a neural network with one hidden layer and $k$ neurons. As the student has fewer neurons than the teacher, it is unclear whether each of the $n$ student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. We find for the erf activation function that flow gradient converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron.
arXiv Detail & Related papers (2023-11-03T00:21:36Z)
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features. We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z)
Expressivity of Spiking Neural Networks [15.181458163440634]
We study the capabilities of spiking neural networks where information is encoded in the firing time of neurons. In contrast to ReLU networks, we prove that spiking neural networks can realize both continuous and discontinuous functions.
arXiv Detail & Related papers (2023-08-16T08:45:53Z)
Generalization Ability of Wide Neural Networks on $\mathbb{R}$ [8.508360765158326]
We study the generalization ability of the wide two-layer ReLU neural network on $mathbbR$. We show that: $i)$ when the width $mrightarrowinfty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_1$ is $n-2/3$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv
arXiv Detail & Related papers (2023-02-12T15:07:27Z)
Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption. They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware. A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks. We define AND-like neurons and propose measures to increase their proportion in the network. Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
Neuron-based explanations of neural networks sacrifice completeness and interpretability [67.53271920386851]
We show that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability. We show the most important principal components provide more complete and interpretable explanations than the most important neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings.
arXiv Detail & Related papers (2020-11-05T21:26:03Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
Network size and weights size for memorization with two-layers neural networks [15.333300054767726]
We propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons. We show approximate memorization with both $Oleft(fracnd cdot fraclog(1/epsilon)epsilonright)$ neurons, as well as nearly-optimal size of the weights.
arXiv Detail & Related papers (2020-06-04T13:44:57Z)
Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.