DiffPrune: Neural Network Pruning with Deterministic Approximate Binary
Gates and $L_0$ Regularization
- URL: http://arxiv.org/abs/2012.03653v2
- Date: Sat, 6 Mar 2021 06:55:10 GMT
- Title: DiffPrune: Neural Network Pruning with Deterministic Approximate Binary
Gates and $L_0$ Regularization
- Authors: Yaniv Shulman
- Abstract summary: Modern neural network architectures typically have many millions of parameters and can be pruned significantly without substantial loss in effectiveness.
The contribution of this work is two-fold.
The first is a method for approximating a multivariate Bernoulli random variable by means of a deterministic and differentiable transformation of any real-valued random variable.
The second is a method for model selection by element-wise parameters with approximate binary gates that may be computed deterministically or multiplicationally and take on exact zero values.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern neural network architectures typically have many millions of
parameters and can be pruned significantly without substantial loss in
effectiveness which demonstrates they are over-parameterized. The contribution
of this work is two-fold. The first is a method for approximating a
multivariate Bernoulli random variable by means of a deterministic and
differentiable transformation of any real-valued multivariate random variable.
The second is a method for model selection by element-wise multiplication of
parameters with approximate binary gates that may be computed deterministically
or stochastically and take on exact zero values. Sparsity is encouraged by the
inclusion of a surrogate regularization to the $L_0$ loss. Since the method is
differentiable it enables straightforward and efficient learning of model
architectures by an empirical risk minimization procedure with stochastic
gradient descent and theoretically enables conditional computation during
training. The method also supports any arbitrary group sparsity over parameters
or activations and therefore offers a framework for unstructured or flexible
structured model pruning. To conclude experiments are performed to demonstrate
the effectiveness of the proposed approach.
Related papers
- Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - Learning minimal representations of stochastic processes with
variational autoencoders [52.99137594502433]
We introduce an unsupervised machine learning approach to determine the minimal set of parameters required to describe a process.
Our approach enables for the autonomous discovery of unknown parameters describing processes.
arXiv Detail & Related papers (2023-07-21T14:25:06Z) - Robust scalable initialization for Bayesian variational inference with
multi-modal Laplace approximations [0.0]
Variational mixtures with full-covariance structures suffer from a quadratic growth due to variational parameters with the number of parameters.
We propose a method for constructing an initial Gaussian model approximation that can be used to warm-start variational inference.
arXiv Detail & Related papers (2023-07-12T19:30:04Z) - Structured model selection via $\ell_1-\ell_2$ optimization [1.933681537640272]
We develop a learning approach for identifying structured dynamical systems.
We show that if the set of candidate functions forms a bounded system, the recovery is stable and is bounded.
arXiv Detail & Related papers (2023-05-27T12:51:26Z) - Scalable and adaptive variational Bayes methods for Hawkes processes [4.580983642743026]
We propose a novel sparsity-inducing procedure, and derive an adaptive mean-field variational algorithm for the popular sigmoid Hawkes processes.
Our algorithm is parallelisable and therefore computationally efficient in high-dimensional setting.
arXiv Detail & Related papers (2022-12-01T05:35:32Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Causality-based Counterfactual Explanation for Classification Models [11.108866104714627]
We propose a prototype-based counterfactual explanation framework (ProCE)
ProCE is capable of preserving the causal relationship underlying the features of the counterfactual data.
In addition, we design a novel gradient-free optimization based on the multi-objective genetic algorithm that generates the counterfactual explanations.
arXiv Detail & Related papers (2021-05-03T09:25:59Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Generalized Gumbel-Softmax Gradient Estimator for Various Discrete
Random Variables [16.643346012854156]
Esting the gradients of nodes is one of the crucial research questions in the deep generative modeling community.
This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation.
arXiv Detail & Related papers (2020-03-04T01:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.