Related papers: Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks

Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks

URL: http://arxiv.org/abs/2401.08830v1
Date: Tue, 16 Jan 2024 21:07:04 GMT
Title: Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks
Authors: Tim Whitaker, Darrell Whitley
Abstract summary: Large numbers of parameters can be removed from trained models with little discernible loss in accuracy after a small number of continued training epochs. Iterative pruning approaches mitigate this by gradually removing a small number of parameters over multiple epochs. We introduce a novel and effective approach to tuning neuralworks through a regularization technique we call Subnetwork Annealing.
Score: 4.8951183832371
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pruning methods have recently grown in popularity as an effective way to reduce the size and computational complexity of deep neural networks. Large numbers of parameters can be removed from trained models with little discernible loss in accuracy after a small number of continued training epochs. However, pruning too many parameters at once often causes an initial steep drop in accuracy which can undermine convergence quality. Iterative pruning approaches mitigate this by gradually removing a small number of parameters over multiple epochs. However, this can still lead to subnetworks that overfit local regions of the loss landscape. We introduce a novel and effective approach to tuning subnetworks through a regularization technique we call Stochastic Subnetwork Annealing. Instead of removing parameters in a discrete manner, we instead represent subnetworks with stochastic masks where each parameter has a probabilistic chance of being included or excluded on any given forward pass. We anneal these probabilities over time such that subnetwork structure slowly evolves as mask values become more deterministic, allowing for a smoother and more robust optimization of subnetworks at high levels of sparsity.

Related papers

Find A Winning Sign: Sign Is All We Need to Win the Lottery [52.63674911541416]
We show that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved. To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with normalization layer parameters.
arXiv Detail & Related papers (2025-04-07T09:30:38Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
Layer Ensembles [95.42181254494287]
We introduce a method for uncertainty estimation that considers a set of independent categorical distributions for each layer of the network. We show that the method can be further improved by ranking samples, resulting in models that require less memory and time to run.
arXiv Detail & Related papers (2022-10-10T17:52:47Z)
Likelihood-Free Inference with Generative Neural Networks via Scoring Rule Minimization [0.0]
Inference methods yield posterior approximations for simulator models with intractable likelihood. Many works trained neural networks to approximate either the intractable likelihood or the posterior directly. Here, we propose to approximate the posterior with generative networks trained by Scoring Rule minimization.
arXiv Detail & Related papers (2022-05-31T13:32:55Z)
Boosting Pruned Networks with Linear Over-parameterization [8.796518772724955]
Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. We propose a novel method that first linearly over- parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters.
arXiv Detail & Related papers (2022-04-25T05:30:26Z)
Extracting Effective Subnetworks with Gumebel-Softmax [9.176056742068813]
We devise an alternative pruning method that allows extracting effective pruningworks from larger untrained ones. Our method is explored and extractsworks by exploring different topologies which are sampled using Gumbel Softmax. The resultingworks are further enhanced using a highly efficient rescaling mechanism that reduces training time and improves performances.
arXiv Detail & Related papers (2022-02-25T21:31:30Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
Tunable Subnetwork Splitting for Model-parallelism of Neural Network Training [12.755664985045582]
We propose a Tunable Subnetwork Splitting Method (TSSM) to tune the decomposition of deep neural networks. Our proposed TSSM can achieve significant speedup without observable loss of training accuracy.
arXiv Detail & Related papers (2020-09-09T01:05:12Z)
ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Progressive Skeletonization: Trimming more fat from a network at initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity. We then propose two approximate procedures to maximize our objective. Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.