Y-Drop: A Conductance based Dropout for fully connected layers
- URL: http://arxiv.org/abs/2409.09088v1
- Date: Wed, 11 Sep 2024 15:56:08 GMT
- Title: Y-Drop: A Conductance based Dropout for fully connected layers
- Authors: Efthymios Georgiou, Georgios Paraskevopoulos, Alexandros Potamianos,
- Abstract summary: We introduce Y-Drop, a regularization method that biases the dropout algorithm towards dropping more important neurons with higher probability.
We show that forcing the network to solve the task at hand in the absence of its important units yields a strong regularization effect.
- Score: 63.029110722758496
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we introduce Y-Drop, a regularization method that biases the dropout algorithm towards dropping more important neurons with higher probability. The backbone of our approach is neuron conductance, an interpretable measure of neuron importance that calculates the contribution of each neuron towards the end-to-end mapping of the network. We investigate the impact of the uniform dropout selection criterion on performance by assigning higher dropout probability to the more important units. We show that forcing the network to solve the task at hand in the absence of its important units yields a strong regularization effect. Further analysis indicates that Y-Drop yields solutions where more neurons are important, i.e have high conductance, and yields robust networks. In our experiments we show that the regularization effect of Y-Drop scales better than vanilla dropout w.r.t. the architecture size and consistently yields superior performance over multiple datasets and architecture combinations, with little tuning.
Related papers
- Dynamic DropConnect: Enhancing Neural Network Robustness through Adaptive Edge Dropping Strategies [2.07180164747172]
Dropout and DropConnect are well-known techniques that apply a consistent drop rate to randomly deactivate neurons or edges in a neural network layer during training.
This paper introduces a novel methodology that assigns dynamic drop rates to each edge within a layer, uniquely tailoring the dropping process without incorporating additional learning parameters.
arXiv Detail & Related papers (2025-02-27T10:17:02Z) - Hessian Eigenvectors and Principal Component Analysis of Neural Network
Weight Matrices [0.0]
This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters.
We unveil a correlation between Hessian eigenvectors and network weights.
This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network.
arXiv Detail & Related papers (2023-11-01T11:38:31Z) - Decorrelating neurons using persistence [29.25969187808722]
We present two regularisation terms computed from the weights of a minimum spanning tree of a clique.
We demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms.
We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms.
arXiv Detail & Related papers (2023-08-09T11:09:14Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z) - Identifying Critical Neurons in ANN Architectures using Mixed Integer
Programming [11.712073757744452]
We introduce a mixed integer program (MIP) for assigning importance scores to each neuron in deep neural network architectures.
We drive the solver to minimize the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the trained neural network.
arXiv Detail & Related papers (2020-02-17T21:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.