Improving Generalization of Complex Models under Unbounded Loss Using PAC-Bayes Bounds
- URL: http://arxiv.org/abs/2305.19243v3
- Date: Sun, 20 Oct 2024 01:50:51 GMT
- Title: Improving Generalization of Complex Models under Unbounded Loss Using PAC-Bayes Bounds
- Authors: Xitong Zhang, Avrajit Ghosh, Guangliang Liu, Rongrong Wang,
- Abstract summary: PAC-Bayes learning theory has focused extensively on establishing tight upper bounds for test errors.
A recently proposed training procedure called PAC-Bayes training, updates the model toward minimizing these bounds.
This approach is theoretically sound, in practice, it has not achieved a test error as low as those obtained by empirical risk minimization (ERM)
We introduce a new PAC-Bayes training algorithm with improved performance and reduced reliance on prior tuning.
- Score: 10.94126149188336
- License:
- Abstract: Previous research on PAC-Bayes learning theory has focused extensively on establishing tight upper bounds for test errors. A recently proposed training procedure called PAC-Bayes training, updates the model toward minimizing these bounds. Although this approach is theoretically sound, in practice, it has not achieved a test error as low as those obtained by empirical risk minimization (ERM) with carefully tuned regularization hyperparameters. Additionally, existing PAC-Bayes training algorithms often require bounded loss functions and may need a search over priors with additional datasets, which limits their broader applicability. In this paper, we introduce a new PAC-Bayes training algorithm with improved performance and reduced reliance on prior tuning. This is achieved by establishing a new PAC-Bayes bound for unbounded loss and a theoretically grounded approach that involves jointly training the prior and posterior using the same dataset. Our comprehensive evaluations across various classification tasks and neural network architectures demonstrate that the proposed method not only outperforms existing PAC-Bayes training algorithms but also approximately matches the test accuracy of ERM that is optimized by SGD/Adam using various regularization methods with optimal hyperparameters.
Related papers
- Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven
Perturbed Gradient Descent [11.866227238721939]
We propose a two-stage fine-tuning method, PAC-tuning, to address this optimization challenge.
PAC-tuning directly minimizes the PAC-Bayes bound to learn proper parameter distribution.
Second, PAC-tuning modifies the gradient by injecting noise with the variance learned in the first stage into the model parameters during training, resulting in a variant of perturbed descent.
arXiv Detail & Related papers (2023-10-26T17:09:13Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Improving Robust Generalization by Direct PAC-Bayesian Bound
Minimization [27.31806334022094]
Recent research has shown an overfitting-like phenomenon in which models trained against adversarial attacks exhibit higher robustness on the training set compared to the test set.
In this paper we consider a different form of the robust PAC-Bayesian bound and directly minimize it with respect to the model posterior.
We evaluate our TrH regularization approach over CIFAR-10/100 and ImageNet using Vision Transformers (ViT) and compare against baseline adversarial robustness algorithms.
arXiv Detail & Related papers (2022-11-22T23:12:00Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - Conditional Gaussian PAC-Bayes [19.556744028461004]
The present paper proposes a novel training algorithm that optimises the PAC-Bayesian bound, without relying on any surrogate loss.
Empirical results show that the bounds obtained with this approach are tighter than those found in the literature.
arXiv Detail & Related papers (2021-10-22T16:12:03Z) - PAC-Bayes Bounds for Meta-learning with Data-Dependent Prior [36.38937352131301]
We derive three novel generalisation error bounds for meta-learning based on PAC-Bayes relative entropy bound.
Experiments illustrate that the proposed three PAC-Bayes bounds for meta-learning guarantee a competitive generalization performance guarantee.
arXiv Detail & Related papers (2021-02-07T09:03:43Z) - PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees [77.67258935234403]
We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning.
We develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-level regularization.
arXiv Detail & Related papers (2020-02-13T15:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.