Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
- URL: http://arxiv.org/abs/2205.07320v1
- Date: Sun, 15 May 2022 15:58:27 GMT
- Title: Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
- Authors: Keitaro Sakamoto, Issei Sato
- Abstract summary: We show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior.
We offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets.
- Score: 25.157282476221482
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lottery ticket hypothesis (LTH) has attracted attention because it can
explain why over-parameterized models often show high generalization ability.
It is known that when we use iterative magnitude pruning (IMP), which is an
algorithm to find sparse networks with high generalization ability that can be
trained from the initial weights independently, called winning tickets, the
initial large learning rate does not work well in deep neural networks such as
ResNet. However, since the initial large learning rate generally helps the
optimizer to converge to flatter minima, we hypothesize that the winning
tickets have relatively sharp minima, which is considered a disadvantage in
terms of generalization ability. In this paper, we confirm this hypothesis and
show that the PAC-Bayesian theory can provide an explicit understanding of the
relationship between LTH and generalization behavior. On the basis of our
experimental findings that flatness is useful for improving accuracy and
robustness to label noise and that the distance from the initial weights is
deeply involved in winning tickets, we offer the PAC-Bayes bound using a
spike-and-slab distribution to analyze winning tickets. Finally, we revisit
existing algorithms for finding winning tickets from a PAC-Bayesian perspective
and provide new insights into these methods.
Related papers
- Iterative Magnitude Pruning as a Renormalisation Group: A Study in The
Context of The Lottery Ticket Hypothesis [0.0]
This thesis focuses on the Lottery Ticket Hypothesis (LTH)
The LTH posits that within extensive Deep Neural Networks (DNNs), smaller, trainable "winning tickets" can achieve performance comparable to the full model.
A key process in LTH, Iterative Magnitude Pruning (IMP), incrementally eliminates minimal weights, emulating stepwise learning in DNNs.
In other words, we check if a winning ticket that works well for one specific problem could also work well for other, similar problems.
arXiv Detail & Related papers (2023-08-06T14:36:57Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Demystify Optimization and Generalization of Over-parameterized
PAC-Bayesian Learning [20.295960197612743]
PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution.
We show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression.
We further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network.
arXiv Detail & Related papers (2022-02-04T03:49:11Z) - User-friendly introduction to PAC-Bayes bounds [0.6599344783327052]
In statistical learning theory there is a set of tools designed to understand the generalization ability of procedures: PAC-Bayesian or PAC-Bayes bounds.
Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain.
arXiv Detail & Related papers (2021-10-21T15:50:05Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Semi-verified PAC Learning from the Crowd [7.594050968868919]
We study the problem of crowdsourced PAC learning of threshold functions.
We show that under the semi-verified model of Charikar et al., it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries.
arXiv Detail & Related papers (2021-06-13T20:05:16Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets.
The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning.
We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z) - A General Framework for the Practical Disintegration of PAC-Bayesian
Bounds [2.516393111664279]
We introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrated bounds.
Our bounds are easily optimizable and can be used to design learning algorithms.
arXiv Detail & Related papers (2021-02-17T09:36:46Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.