Statistical Mechanical Analysis of Neural Network Pruning
- URL: http://arxiv.org/abs/2006.16617v3
- Date: Fri, 11 Jun 2021 04:34:46 GMT
- Title: Statistical Mechanical Analysis of Neural Network Pruning
- Authors: Rupam Acharyya, Ankani Chattoraj, Boyu Zhang, Shouman Das, Daniel
Stefankovic
- Abstract summary: We show that DPP based node pruning method is notably superior to competing approaches when tested on real datasets.
We use our theoretical setup to prove this finding and show that even the baseline random edge pruning method performs better than the DPP node pruning method.
- Score: 6.029526715675584
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Deep learning architectures with a huge number of parameters are often
compressed using pruning techniques to ensure computational efficiency of
inference during deployment. Despite multitude of empirical advances, there is
a lack of theoretical understanding of the effectiveness of different pruning
methods. We inspect different pruning techniques under the statistical
mechanics formulation of a teacher-student framework and derive their
generalization error (GE) bounds. It has been shown that Determinantal Point
Process (DPP) based node pruning method is notably superior to competing
approaches when tested on real datasets. Using GE bounds in the aforementioned
setup we provide theoretical guarantees for their empirical observations.
Another consistent finding in literature is that sparse neural networks (edge
pruned) generalize better than dense neural networks (node pruned) for a fixed
number of parameters. We use our theoretical setup to prove this finding and
show that even the baseline random edge pruning method performs better than the
DPP node pruning method. We also validate this empirically on real datasets.
Related papers
- Uncertainty Quantification with the Empirical Neural Tangent Kernel [12.388707890314539]
We propose a post-hoc, sampling-based UQ method for over- parameterized networks at the end of training.
We demonstrate that our method effectively approximates the posterior of a Gaussian process using the empirical Neural Tangent Kernel.
We show that our method not only outperforms competing approaches in computational efficiency (often reducing costs by multiple factors) but also maintains state-of-the-art performance across a variety of UQ metrics for both regression and classification tasks.
arXiv Detail & Related papers (2025-02-05T04:01:34Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Tackling Oversmoothing of GNNs with Contrastive Learning [35.88575306925201]
Graph neural networks (GNNs) integrate the comprehensive relation of graph data and representation learning capability.
Oversmoothing makes the final representations of nodes indiscriminative, thus deteriorating the node classification and link prediction performance.
We propose the Topology-guided Graph Contrastive Layer, named TGCL, which is the first de-oversmoothing method maintaining all three mentioned metrics.
arXiv Detail & Related papers (2021-10-26T15:56:16Z) - Lost in Pruning: The Effects of Pruning Neural Networks beyond Test
Accuracy [42.15969584135412]
Neural network pruning is a popular technique used to reduce the inference costs of modern networks.
We evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well.
We find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks.
arXiv Detail & Related papers (2021-03-04T13:22:16Z) - On the Importance of Sampling in Learning Graph Convolutional Networks [13.713485304798368]
Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of graph-related applications.
Despite their success, training GCNs on large graphs suffers from computational and memory issues.
We describe and analyze a general textbftextitdoubly variance reduction schema that can accelerate any sampling method under the memory budget.
arXiv Detail & Related papers (2021-03-03T21:31:23Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z) - Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation.
We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.