Pruning Neural Networks via Coresets and Convex Geometry: Towards No
Assumptions
- URL: http://arxiv.org/abs/2209.08554v1
- Date: Sun, 18 Sep 2022 12:45:26 GMT
- Title: Pruning Neural Networks via Coresets and Convex Geometry: Towards No
Assumptions
- Authors: Murad Tukan, Loay Mualem, Alaa Maalouf
- Abstract summary: Pruning is one of the predominant approaches for compressing deep neural networks (DNNs)
We propose a novel and robust framework for computing such coresets under mild assumptions on the model's weights and inputs.
Our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets.
- Score: 10.635248457021499
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pruning is one of the predominant approaches for compressing deep neural
networks (DNNs). Lately, coresets (provable data summarizations) were leveraged
for pruning DNNs, adding the advantage of theoretical guarantees on the
trade-off between the compression rate and the approximation error. However,
coresets in this domain were either data-dependent or generated under
restrictive assumptions on both the model's weights and inputs. In real-world
scenarios, such assumptions are rarely satisfied, limiting the applicability of
coresets. To this end, we suggest a novel and robust framework for computing
such coresets under mild assumptions on the model's weights and without any
assumption on the training data. The idea is to compute the importance of each
neuron in each layer with respect to the output of the following layer. This is
achieved by a combination of L\"{o}wner ellipsoid and Caratheodory theorem. Our
method is simultaneously data-independent, applicable to various networks and
datasets (due to the simplified assumptions), and theoretically supported.
Experimental results show that our method outperforms existing coreset based
neural pruning approaches across a wide range of networks and datasets. For
example, our method achieved a $62\%$ compression rate on ResNet50 on ImageNet
with $1.09\%$ drop in accuracy.
Related papers
- Efficient Model Compression for Bayesian Neural Networks [4.179545514579061]
We demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup.
We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data.
arXiv Detail & Related papers (2024-11-01T00:07:59Z) - "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Fast, Distribution-free Predictive Inference for Neural Networks with
Coverage Guarantees [25.798057062452443]
This paper introduces a novel, computationally-efficient algorithm for predictive inference (PI)
It requires no distributional assumptions on the data and can be computed faster than existing bootstrap-type methods for neural networks.
arXiv Detail & Related papers (2023-06-11T04:03:58Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical
Guarantees and Implementation Details [0.5156484100374059]
Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies.
We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training.
We establish the fundamental result of variational posterior consistency together with the characterization of prior parameters.
arXiv Detail & Related papers (2021-08-25T00:48:07Z) - Data-Independent Structured Pruning of Neural Networks via Coresets [21.436706159840018]
We propose the first efficient structured pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample.
Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $xin mathbbRd$, including an adversarial one.
arXiv Detail & Related papers (2020-08-19T08:03:09Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.