A General Framework for the Practical Disintegration of PAC-Bayesian
Bounds
- URL: http://arxiv.org/abs/2102.08649v3
- Date: Mon, 18 Sep 2023 09:02:40 GMT
- Title: A General Framework for the Practical Disintegration of PAC-Bayesian
Bounds
- Authors: Paul Viallard (SIERRA), Pascal Germain, Amaury Habrard (LHC), Emilie
Morvant (LHC)
- Abstract summary: We introduce new PAC-Bayesian generalization bounds that have the originality to provide disintegrated bounds.
Our bounds are easily optimizable and can be used to design learning algorithms.
- Score: 2.516393111664279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: PAC-Bayesian bounds are known to be tight and informative when studying the
generalization ability of randomized classifiers. However, they require a loose
and costly derandomization step when applied to some families of deterministic
models such as neural networks. As an alternative to this step, we introduce
new PAC-Bayesian generalization bounds that have the originality to provide
disintegrated bounds, i.e., they give guarantees over one single hypothesis
instead of the usual averaged analysis. Our bounds are easily optimizable and
can be used to design learning algorithms. We illustrate this behavior on
neural networks, and we show a significant practical improvement over the
state-of-the-art framework.
Related papers
- Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation [4.239829789304117]
We use the PAC-Bayesian theory for the setting of learning-to-optimize.
We present the first framework to learn optimization algorithms with provable generalization guarantees.
Our learned algorithms provably outperform related ones derived from a (deterministic) worst-case analysis.
arXiv Detail & Related papers (2024-04-04T08:24:57Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - PAC-Bayes Compression Bounds So Tight That They Can Explain
Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace.
We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - PAC-Bayesian Learning of Optimization Algorithms [6.624726878647541]
We apply the PAC-Bayes theory to the setting of learning-to-optimize.
We learn optimization algorithms with provable generalization guarantees (PAC-bounds) and explicit trade-off between a high probability of convergence and a high convergence speed.
Our results rely on PAC-Bayes bounds for general, unbounded loss-functions based on exponential families.
arXiv Detail & Related papers (2022-10-20T09:16:36Z) - PAC-Bayesian Domain Adaptation Bounds for Multiclass Learners [13.33450619901885]
We propose the first PAC-Bayesian adaptation bounds for multiclass learners.
For divergences dependent on a Gibbs predictor, we propose additional PAC-Bayesian adaptation bounds.
We apply our bounds to analyze a common adaptation algorithm that uses neural networks.
arXiv Detail & Related papers (2022-07-12T17:07:59Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Demystify Optimization and Generalization of Over-parameterized
PAC-Bayesian Learning [20.295960197612743]
PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution.
We show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression.
We further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network.
arXiv Detail & Related papers (2022-02-04T03:49:11Z) - Wide stochastic networks: Gaussian limit and PAC-Bayesian training [21.979820411421827]
We show that an extremely large network is approximated by a Gaussian process, both before and during training.
The explicit evaluation of the output distribution allows for a PAC-Bayesian training procedure that directly optimize the bound.
For a large but finite-width network, we show empirically on MNIST that this training approach can outperform standard PAC-Bayesian methods.
arXiv Detail & Related papers (2021-06-17T20:25:38Z) - PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees [77.67258935234403]
We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning.
We develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-level regularization.
arXiv Detail & Related papers (2020-02-13T15:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.