Related papers: Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

URL: http://arxiv.org/abs/2403.07688v1
Date: Tue, 12 Mar 2024 14:28:06 GMT
Title: Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Authors: Simon Dufort-Labb\'e, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin
Abstract summary: We introduce DemP, a method that controls the proliferation of dead neurons, dynamically leading to sparsity. Experiments on CIFAR10 and ImageNet datasets demonstrate superior accuracy-sparsity tradeoffs.
Score: 27.289945121113277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity and pruning. By systematically exploring the impact of various hyperparameter configurations on dying neurons, we unveil their potential to facilitate simple yet effective structured pruning algorithms. We introduce $\textit{Demon Pruning}$ (DemP), a method that controls the proliferation of dead neurons, dynamically leading to network sparsity. Achieved through a combination of noise injection on active units and a one-cycled schedule regularization strategy, DemP stands out for its simplicity and broad applicability. Experiments on CIFAR10 and ImageNet datasets demonstrate that DemP surpasses existing structured pruning techniques, showcasing superior accuracy-sparsity tradeoffs and training speedups. These findings suggest a novel perspective on dying neurons as a valuable resource for efficient model compression and optimization.

Related papers

Neural Spatial-Temporal Tensor Representation for Infrared Small Target Detection [3.7038542578642724]
We introduce a Neural-represented spatial-temporal model (NeurSTT) for infrared small target detection. NeurSTT enhances spatial-temporal correlations in background approximation, thereby supporting target detection in an unsupervised manner. Visual and numerical results across various datasets demonstrate that our method outperforms the suboptimal method on $256 times 256$ sequences.
arXiv Detail & Related papers (2024-12-23T05:46:08Z)
Augmented Neural Fine-Tuning for Efficient Backdoor Purification [16.74156528484354]
Recent studies have revealed the vulnerability of deep neural networks (DNNs) to various backdoor attacks. We propose Neural mask Fine-Tuning (NFT) with an aim to optimally re-organize the neuron activities. NFT relaxes the trigger synthesis process and eliminates the requirement of the adversarial search module.
arXiv Detail & Related papers (2024-07-14T02:36:54Z)
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference [2.0822643340897273]
We show that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model. We achieve up to $20times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task.
arXiv Detail & Related papers (2023-11-13T08:18:44Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Energy Efficient Training of SNN using Local Zeroth Order Method [18.81001891391638]
Spiking neural networks are becoming increasingly popular for their low energy requirement in real-world tasks. SNN training algorithms face the loss of gradient information and non-differentiability due to the Heaviside function. We propose a differentiable approximation of the Heaviside in the backward pass, while the forward pass uses the Heaviside as the spiking function.
arXiv Detail & Related papers (2023-02-02T06:57:37Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
Neuro-Inspired Deep Neural Networks with Sparse, Strong Activations [11.707981310045742]
End-to-end training of Deep Neural Networks (DNNs) yields state of the art performance in an increasing array of applications. We report here on a promising neuro-inspired approach to perturbations with sparser and stronger activations.
arXiv Detail & Related papers (2022-02-26T06:19:05Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST) Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z)
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet [11.605253906375424]
Dynamic Hard Pruning (DynHP) technique incrementally prunes the network during training. DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training. Freed memory is reused by a emphdynamic batch sizing approach to counterbalance the accuracy degradation caused by the hard pruning strategy.
arXiv Detail & Related papers (2020-11-17T10:23:28Z)
Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation. This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.