Related papers: Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

URL: http://arxiv.org/abs/2205.07320v1
Date: Sun, 15 May 2022 15:58:27 GMT
Title: Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
Authors: Keitaro Sakamoto, Issei Sato
Abstract summary: We show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. We offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets.
Score: 25.157282476221482
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.

Related papers

Weisfeiler and Leman Go Gambling: Why Expressive Lottery Tickets Win [5.523076080195755]
Lottery ticket hypothesis (LTH) is well-studied for convolutional neural networks but has been validated only empirically for graph neural networks (GNNs)<n>We identify the expressivity of sparseworks, i.e. their ability to distinguish non-isomorphic graphs, as crucial for finding winning tickets that preserve the predictive performance.<n>Our findings establish novel theoretical foundations for both LTH and GNN research, highlighting the importance of maintaining expressivity in sparsely GNNs.
arXiv Detail & Related papers (2025-06-04T13:10:59Z)
Iterative Magnitude Pruning as a Renormalisation Group: A Study in The Context of The Lottery Ticket Hypothesis [0.0]
This thesis focuses on the Lottery Ticket Hypothesis (LTH) The LTH posits that within extensive Deep Neural Networks (DNNs), smaller, trainable "winning tickets" can achieve performance comparable to the full model. A key process in LTH, Iterative Magnitude Pruning (IMP), incrementally eliminates minimal weights, emulating stepwise learning in DNNs. In other words, we check if a winning ticket that works well for one specific problem could also work well for other, similar problems.
arXiv Detail & Related papers (2023-08-06T14:36:57Z)
Robust computation of optimal transport by $\beta$-potential regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z)
Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark. We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z)
Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning [20.295960197612743]
PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution. We show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression. We further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network.
arXiv Detail & Related papers (2022-02-04T03:49:11Z)
User-friendly introduction to PAC-Bayes bounds [0.6599344783327052]
In statistical learning theory there is a set of tools designed to understand the generalization ability of procedures: PAC-Bayesian or PAC-Bayes bounds. Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain.
arXiv Detail & Related papers (2021-10-21T15:50:05Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Semi-verified PAC Learning from the Crowd [7.594050968868919]
We study the problem of crowdsourced PAC learning of threshold functions. We show that under the semi-verified model of Charikar et al., it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries.
arXiv Detail & Related papers (2021-06-13T20:05:16Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
The Elastic Lottery Ticket Hypothesis [106.79387235014379]
Lottery Ticket Hypothesis raises keen attention to identifying sparse trainableworks or winning tickets. The most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning. We propose a variety of strategies to tweak the winning tickets found from different networks of the same model family.
arXiv Detail & Related papers (2021-03-30T17:53:45Z)
A General Framework for the Practical Disintegration of PAC-Bayesian Bounds [2.516393111664279]
We introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrated bounds. Our bounds are easily optimizable and can be used to design learning algorithms.
arXiv Detail & Related papers (2021-02-17T09:36:46Z)
Towards Practical Lottery Ticket Hypothesis for Adversarial Training [78.30684998080346]
We show there exists a subset of the aforementioned sub-networks that converge significantly faster during the training process. As a practical application of our findings, we demonstrate that such sub-networks can help in cutting down the total time of adversarial training.
arXiv Detail & Related papers (2020-03-06T03:11:52Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.