Unlocking Tuning-free Generalization: Minimizing the PAC-Bayes Bound
with Trainable Priors
- URL: http://arxiv.org/abs/2305.19243v2
- Date: Sun, 1 Oct 2023 22:36:36 GMT
- Title: Unlocking Tuning-free Generalization: Minimizing the PAC-Bayes Bound
with Trainable Priors
- Authors: Xitong Zhang, Avrajit Ghosh, Guangliang Liu and Rongrong Wang
- Abstract summary: PAC-Bayes training framework is nearly tuning-free and requires no additional regularization.
Our proposed algorithm demonstrates the remarkable potential of PAC training to achieve state-of-the-art performance on deep neural networks.
- Score: 11.952542165016222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is widely recognized that the generalization ability of neural networks
can be greatly enhanced through carefully designing the training procedure. The
current state-of-the-art training approach involves utilizing stochastic
gradient descent (SGD) or Adam optimization algorithms along with a combination
of additional regularization techniques such as weight decay, dropout, or noise
injection. Optimal generalization can only be achieved by tuning a multitude of
hyperparameters through grid search, which can be time-consuming and
necessitates additional validation datasets. To address this issue, we
introduce a practical PAC-Bayes training framework that is nearly tuning-free
and requires no additional regularization while achieving comparable testing
performance to that of SGD/Adam after a complete grid search and with extra
regularizations. Our proposed algorithm demonstrates the remarkable potential
of PAC training to achieve state-of-the-art performance on deep neural networks
with enhanced robustness and interpretability.
Related papers
- Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation [67.80294336559574]
Continual Test Time Adaptation (CTTA) is a task that requires a source pre-trained model to continually adapt to new scenarios.<n>We propose a novel pipeline, Orthogonal Projection Subspace to aggregate online Prior-knowledge, dubbed OoPk.
arXiv Detail & Related papers (2025-06-23T18:17:39Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven
Perturbed Gradient Descent [11.866227238721939]
We propose a two-stage fine-tuning method, PAC-tuning, to address this optimization challenge.
PAC-tuning directly minimizes the PAC-Bayes bound to learn proper parameter distribution.
Second, PAC-tuning modifies the gradient by injecting noise with the variance learned in the first stage into the model parameters during training, resulting in a variant of perturbed descent.
arXiv Detail & Related papers (2023-10-26T17:09:13Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Improving Robust Generalization by Direct PAC-Bayesian Bound
Minimization [27.31806334022094]
Recent research has shown an overfitting-like phenomenon in which models trained against adversarial attacks exhibit higher robustness on the training set compared to the test set.
In this paper we consider a different form of the robust PAC-Bayesian bound and directly minimize it with respect to the model posterior.
We evaluate our TrH regularization approach over CIFAR-10/100 and ImageNet using Vision Transformers (ViT) and compare against baseline adversarial robustness algorithms.
arXiv Detail & Related papers (2022-11-22T23:12:00Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - Conditional Gaussian PAC-Bayes [19.556744028461004]
The present paper proposes a novel training algorithm that optimises the PAC-Bayesian bound, without relying on any surrogate loss.
Empirical results show that the bounds obtained with this approach are tighter than those found in the literature.
arXiv Detail & Related papers (2021-10-22T16:12:03Z) - PAC-Bayes Bounds for Meta-learning with Data-Dependent Prior [36.38937352131301]
We derive three novel generalisation error bounds for meta-learning based on PAC-Bayes relative entropy bound.
Experiments illustrate that the proposed three PAC-Bayes bounds for meta-learning guarantee a competitive generalization performance guarantee.
arXiv Detail & Related papers (2021-02-07T09:03:43Z) - PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees [77.67258935234403]
We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning.
We develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-level regularization.
arXiv Detail & Related papers (2020-02-13T15:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.