Related papers: COPS: Controlled Pruning Before Training Starts

COPS: Controlled Pruning Before Training Starts

URL: http://arxiv.org/abs/2107.12673v1
Date: Tue, 27 Jul 2021 08:48:01 GMT
Title: COPS: Controlled Pruning Before Training Starts
Authors: Paul Wimmer, Jens Mehnert, Alexandru Condurache
Abstract summary: State-of-the-art deep neural network (DNN) pruning techniques, applied one-shot before training starts, evaluate sparse architectures with the help of a single criterion -- called pruning score. In this work we do not concentrate on a single pruning criterion, but provide a framework for combining arbitrary GSSs to create more powerful pruning strategies.
Score: 68.8204255655161
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art deep neural network (DNN) pruning techniques, applied one-shot before training starts, evaluate sparse architectures with the help of a single criterion -- called pruning score. Pruning weights based on a solitary score works well for some architectures and pruning rates but may also fail for other ones. As a common baseline for pruning scores, we introduce the notion of a generalized synaptic score (GSS). In this work we do not concentrate on a single pruning criterion, but provide a framework for combining arbitrary GSSs to create more powerful pruning strategies. These COmbined Pruning Scores (COPS) are obtained by solving a constrained optimization problem. Optimizing for more than one score prevents the sparse network to overly specialize on an individual task, thus COntrols Pruning before training Starts. The combinatorial optimization problem given by COPS is relaxed on a linear program (LP). This LP is solved analytically and determines a solution for COPS. Furthermore, an algorithm to compute it for two scores numerically is proposed and evaluated. Solving COPS in such a way has lower complexity than the best general LP solver. In our experiments we compared pruning with COPS against state-of-the-art methods for different network architectures and image classification tasks and obtained improved results.

Related papers

The role of quantum and classical correlations in shrinking algorithms for optimization [0.0]
We study the performance of a shrinking algorithm for optimization problems (COPs) We compare the performance of the algorithm equipped with correlations from the quantum approximate optimization algorithm (QAOA) as well as the classical linear programming (LP) and semi-definite programming (SDP) relaxations. Our results indicate that LP outperforms all other approaches for low-density instances, while SDP excels for high-density problems.
arXiv Detail & Related papers (2024-04-26T08:29:04Z)
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits [12.674929126684528]
We address the problem of semi-bandits, where a player selects among P actions from the power set of a set containing d base items. We show that our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches.
arXiv Detail & Related papers (2024-02-23T08:07:54Z)
A Unified Framework for Soft Threshold Pruning [27.853698217792456]
We reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA) We derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework. In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD.
arXiv Detail & Related papers (2023-02-25T08:16:14Z)
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights. Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set. Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z)
Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation. Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration. We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z)
Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints. We develop novel and simple optimization formulations for these problems. As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z)
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget. We formulate predictor-based architecture search as learning to rank with pairwise comparisons. The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z)
Constraint-Handling Techniques for Particle Swarm Optimization Algorithms [0.0]
Population-based methods can cope with a variety of different problems, including problems of remarkably higher complexity than those traditional methods can handle. The aim here is to develop and compare different CHTs suitable for PSOs, which are incorporated to an algorithm with general-purpose settings.
arXiv Detail & Related papers (2021-01-25T01:49:10Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.