Generalization Bounds for Magnitude-Based Pruning via Sparse Matrix
Sketching
- URL: http://arxiv.org/abs/2305.18789v2
- Date: Sat, 24 Jun 2023 04:59:52 GMT
- Title: Generalization Bounds for Magnitude-Based Pruning via Sparse Matrix
Sketching
- Authors: Etash Kumar Guha, Prasanjit Dubey, Xiaoming Huo
- Abstract summary: We build on Arora et al. [ 2018] where the error depends on one, the approximation induced by pruning, and two, the number of parameters in the pruned model.
The pruned estimates are close to the unpruned functions with high probability, which improves the first criteria.
We empirically verify the success of this new method on ReLU-activated Feed Forward Networks on the MNIST and CIFAR10 datasets.
- Score: 2.1485350418225244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we derive a novel bound on the generalization error of
Magnitude-Based pruning of overparameterized neural networks. Our work builds
on the bounds in Arora et al. [2018] where the error depends on one, the
approximation induced by pruning, and two, the number of parameters in the
pruned model, and improves upon standard norm-based generalization bounds. The
pruned estimates obtained using our new Magnitude-Based compression algorithm
are close to the unpruned functions with high probability, which improves the
first criteria. Using Sparse Matrix Sketching, the space of the pruned matrices
can be efficiently represented in the space of dense matrices of much smaller
dimensions, thereby lowering the second criterion. This leads to stronger
generalization bound than many state-of-the-art methods, thereby breaking new
ground in the algorithm development for pruning and bounding generalization
error of overparameterized models. Beyond this, we extend our results to obtain
generalization bound for Iterative Pruning [Frankle and Carbin, 2018]. We
empirically verify the success of this new method on ReLU-activated Feed
Forward Networks on the MNIST and CIFAR10 datasets.
Related papers
- Generalization and Risk Bounds for Recurrent Neural Networks [3.0638061480679912]
We establish a new generalization error bound for vanilla RNNs.
We provide a unified framework to calculate the Rademacher complexity that can be applied to a variety of loss functions.
arXiv Detail & Related papers (2024-11-05T03:49:06Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.