Stochastic Subnetwork Annealing: A Regularization Technique for Fine
Tuning Pruned Subnetworks
- URL: http://arxiv.org/abs/2401.08830v1
- Date: Tue, 16 Jan 2024 21:07:04 GMT
- Title: Stochastic Subnetwork Annealing: A Regularization Technique for Fine
Tuning Pruned Subnetworks
- Authors: Tim Whitaker, Darrell Whitley
- Abstract summary: Large numbers of parameters can be removed from trained models with little discernible loss in accuracy after a small number of continued training epochs.
Iterative pruning approaches mitigate this by gradually removing a small number of parameters over multiple epochs.
We introduce a novel and effective approach to tuning neuralworks through a regularization technique we call Subnetwork Annealing.
- Score: 4.8951183832371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pruning methods have recently grown in popularity as an effective way to
reduce the size and computational complexity of deep neural networks. Large
numbers of parameters can be removed from trained models with little
discernible loss in accuracy after a small number of continued training epochs.
However, pruning too many parameters at once often causes an initial steep drop
in accuracy which can undermine convergence quality. Iterative pruning
approaches mitigate this by gradually removing a small number of parameters
over multiple epochs. However, this can still lead to subnetworks that overfit
local regions of the loss landscape. We introduce a novel and effective
approach to tuning subnetworks through a regularization technique we call
Stochastic Subnetwork Annealing. Instead of removing parameters in a discrete
manner, we instead represent subnetworks with stochastic masks where each
parameter has a probabilistic chance of being included or excluded on any given
forward pass. We anneal these probabilities over time such that subnetwork
structure slowly evolves as mask values become more deterministic, allowing for
a smoother and more robust optimization of subnetworks at high levels of
sparsity.
Related papers
- Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Layer Ensembles [95.42181254494287]
We introduce a method for uncertainty estimation that considers a set of independent categorical distributions for each layer of the network.
We show that the method can be further improved by ranking samples, resulting in models that require less memory and time to run.
arXiv Detail & Related papers (2022-10-10T17:52:47Z) - Likelihood-Free Inference with Generative Neural Networks via Scoring
Rule Minimization [0.0]
Inference methods yield posterior approximations for simulator models with intractable likelihood.
Many works trained neural networks to approximate either the intractable likelihood or the posterior directly.
Here, we propose to approximate the posterior with generative networks trained by Scoring Rule minimization.
arXiv Detail & Related papers (2022-05-31T13:32:55Z) - Boosting Pruned Networks with Linear Over-parameterization [8.796518772724955]
Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time.
To restore accuracy after pruning, fine-tuning is usually applied to pruned networks.
We propose a novel method that first linearly over- parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters.
arXiv Detail & Related papers (2022-04-25T05:30:26Z) - Extracting Effective Subnetworks with Gumebel-Softmax [9.176056742068813]
We devise an alternative pruning method that allows extracting effective pruningworks from larger untrained ones.
Our method is explored and extractsworks by exploring different topologies which are sampled using Gumbel Softmax.
The resultingworks are further enhanced using a highly efficient rescaling mechanism that reduces training time and improves performances.
arXiv Detail & Related papers (2022-02-25T21:31:30Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Tunable Subnetwork Splitting for Model-parallelism of Neural Network
Training [12.755664985045582]
We propose a Tunable Subnetwork Splitting Method (TSSM) to tune the decomposition of deep neural networks.
Our proposed TSSM can achieve significant speedup without observable loss of training accuracy.
arXiv Detail & Related papers (2020-09-09T01:05:12Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Progressive Skeletonization: Trimming more fat from a network at
initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity.
We then propose two approximate procedures to maximize our objective.
Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.