Exploring the Lottery Ticket Hypothesis with Explainability Methods:
Insights into Sparse Network Performance
- URL: http://arxiv.org/abs/2307.13698v1
- Date: Fri, 7 Jul 2023 18:33:52 GMT
- Title: Exploring the Lottery Ticket Hypothesis with Explainability Methods:
Insights into Sparse Network Performance
- Authors: Shantanu Ghosh, Kayhan Batmanghelich
- Abstract summary: Lottery Ticket Hypothesis (LTH) finds a network within a deep network with comparable or superior performance to the original model.
In this work, we examine why the performance of the pruned networks gradually increases or decreases.
- Score: 13.773050123620592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discovering a high-performing sparse network within a massive neural network
is advantageous for deploying them on devices with limited storage, such as
mobile phones. Additionally, model explainability is essential to fostering
trust in AI. The Lottery Ticket Hypothesis (LTH) finds a network within a deep
network with comparable or superior performance to the original model. However,
limited study has been conducted on the success or failure of LTH in terms of
explainability. In this work, we examine why the performance of the pruned
networks gradually increases or decreases. Using Grad-CAM and Post-hoc concept
bottleneck models (PCBMs), respectively, we investigate the explainability of
pruned networks in terms of pixels and high-level concepts. We perform
extensive experiments across vision and medical imaging datasets. As more
weights are pruned, the performance of the network degrades. The discovered
concepts and pixels from the pruned networks are inconsistent with the original
network -- a possible reason for the drop in performance.
Related papers
- Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Impact of Disentanglement on Pruning Neural Networks [16.077795265753917]
Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression.
We make use of the Beta-VAE framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations.
arXiv Detail & Related papers (2023-07-19T13:58:01Z) - Network Degeneracy as an Indicator of Training Performance: Comparing
Finite and Infinite Width Angle Predictions [3.04585143845864]
We show that as networks get deeper and deeper, they are more susceptible to becoming degenerate.
We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture.
arXiv Detail & Related papers (2023-06-02T13:02:52Z) - The Lottery Ticket Hypothesis for Self-attention in Convolutional Neural
Network [69.54809052377189]
Recently many plug-and-play self-attention modules (SAMs) are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs)
We empirically find and verify some counterintuitive phenomena that: (a) Connecting the SAMs to all the blocks may not always bring the largest performance boost, and connecting to partial blocks would be even better; (b) Adding the SAMs to a CNN may not always bring a performance boost, and instead it may even harm the performance of the original CNN backbone.
arXiv Detail & Related papers (2022-07-16T07:08:59Z) - Self-Compression in Bayesian Neural Networks [0.9176056742068814]
We propose a new insight into network compression through the Bayesian framework.
We show that Bayesian neural networks automatically discover redundancy in model parameters, thus enabling self-compression.
Our experimental results show that the network architecture can be successfully compressed by deleting parameters identified by the network itself.
arXiv Detail & Related papers (2021-11-10T21:19:40Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Leveraging Sparse Linear Layers for Debuggable Deep Networks [86.94586860037049]
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks.
The resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
arXiv Detail & Related papers (2021-05-11T08:15:25Z) - Prior knowledge distillation based on financial time series [0.8756822885568589]
We propose to use neural networks to represent indicators and train a large network constructed of smaller networks as feature layers.
In numerical experiments, we find that our algorithm is faster and more accurate than traditional methods on real financial datasets.
arXiv Detail & Related papers (2020-06-16T15:26:06Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.