EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks
- URL: http://arxiv.org/abs/2006.04270v5
- Date: Mon, 7 Mar 2022 15:33:11 GMT
- Title: EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks
- Authors: Hojjat Salehinejad and Shahrokh Valaee
- Abstract summary: We propose EDropout as an energy-based framework for pruning neural networks in classification tasks.
A set of binary pruning state vectors (population) represents a set of corresponding sub-networks from an arbitrary provided original neural network.
EDropout can prune typical neural networks without modification of the network architecture.
- Score: 45.4796383952516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dropout is a well-known regularization method by sampling a sub-network from
a larger deep neural network and training different sub-networks on different
subsets of the data. Inspired by the dropout concept, we propose EDropout as an
energy-based framework for pruning neural networks in classification tasks. In
this approach, a set of binary pruning state vectors (population) represents a
set of corresponding sub-networks from an arbitrary provided original neural
network. An energy loss function assigns a scalar energy loss value to each
pruning state. The energy-based model stochastically evolves the population to
find states with lower energy loss. The best pruning state is then selected and
applied to the original network. Similar to dropout, the kept weights are
updated using backpropagation in a probabilistic model. The energy-based model
again searches for better pruning states and the cycle continuous. Indeed, this
procedure is in fact switching between the energy model, which manages the
pruning states, and the probabilistic model, which updates the temporarily
unpruned weights, in each iteration. The population can dynamically converge to
a pruning state. This can be interpreted as dropout leading to pruning the
network. From an implementation perspective, EDropout can prune typical neural
networks without modification of the network architecture. We evaluated the
proposed method on different flavours of ResNets, AlexNet, and SqueezeNet on
the Kuzushiji, Fashion, CIFAR-10, CIFAR-100, and Flowers datasets, and compared
the pruning rate and classification performance of the models. On average the
networks trained with EDropout achieved a pruning rate of more than $50\%$ of
the trainable parameters with approximately $<5\%$ and $<1\%$ drop of Top-1 and
Top-5 classification accuracy, respectively.
Related papers
- Learning effective pruning at initialization from iterative pruning [15.842658282636876]
We present an end-to-end neural network-based PaI method to reduce training costs.
Our approach outperforms existing methods in high-sparsity settings.
As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach.
arXiv Detail & Related papers (2024-08-27T03:17:52Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - FocusedDropout for Convolutional Neural Network [6.066543113636522]
FocusedDropout is a non-random dropout method to make the network focus more on the target.
Even a slight cost, 10% of batches employing FocusedDropout, can produce a nice performance boost over the baselines.
arXiv Detail & Related papers (2021-03-29T08:47:55Z) - A Framework For Pruning Deep Neural Networks Using Energy-Based Models [45.4796383952516]
A typical deep neural network (DNN) has a large number of trainable parameters.
We propose a framework for pruning DNNs based on a population-based global optimization method.
Experiments on ResNets, AlexNet, and SqueezeNet show a pruning rate of more than $50%$ of the trainable parameters.
arXiv Detail & Related papers (2021-02-25T21:44:19Z) - MaxDropout: Deep Neural Network Regularization Based on Maximum Output
Values [0.0]
MaxDropout is a regularizer for deep neural network models that works in a supervised fashion by removing prominent neurons.
We show that it is possible to improve existing neural networks and provide better results in neural networks when Dropout is replaced by MaxDropout.
arXiv Detail & Related papers (2020-07-27T17:55:54Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.