Tighter PAC-Bayes Generalisation Bounds by Leveraging Example Difficulty
- URL: http://arxiv.org/abs/2210.11289v1
- Date: Thu, 20 Oct 2022 14:14:52 GMT
- Title: Tighter PAC-Bayes Generalisation Bounds by Leveraging Example Difficulty
- Authors: Felix Biggs, Benjamin Guedj
- Abstract summary: We introduce a modified version of the excess risk.
It can be used to obtain tighter, fast-rate PAC-Bayesian generalisation bounds.
We empirically evaluate these new bounds on a number of real-world datasets.
- Score: 5.799808780731661
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a modified version of the excess risk, which can be used to
obtain tighter, fast-rate PAC-Bayesian generalisation bounds. This modified
excess risk leverages information about the relative hardness of data examples
to reduce the variance of its empirical counterpart, tightening the bound. We
combine this with a new bound for $[-1, 1]$-valued (and potentially
non-independent) signed losses, which is more favourable when they empirically
have low variance around $0$. The primary new technical tool is a novel result
for sequences of interdependent random vectors which may be of independent
interest. We empirically evaluate these new bounds on a number of real-world
datasets.
Related papers
- PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting.
Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z) - Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective
and Improved Bounds [12.699376765058137]
gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning.
It is only very recently that SGD with sampling without replacement -- shuffled SGD -- has been analyzed.
We prove fine-grained complexity bounds that depend on the data matrix and are never worse than what is predicted by the existing bounds.
arXiv Detail & Related papers (2023-06-21T18:14:44Z) - Fast Rate Information-theoretic Bounds on Generalization Errors [8.102199960821165]
The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data.
Various information-theoretic bounds on the generalization error have been derived in the literature.
This paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size.
arXiv Detail & Related papers (2023-03-26T08:59:05Z) - Tighter Information-Theoretic Generalization Bounds from Supersamples [27.14107452619853]
We present a variety of novel information-theoretic generalization bounds for learning algorithms.
The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness.
We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.
arXiv Detail & Related papers (2023-02-05T17:06:27Z) - Foolish Crowds Support Benign Overfitting [20.102619493827024]
We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data.
Our analysis exposes the benefit of the "wisdom of the crowd", except here the harm arising from fitting the noise is ameliorated by spreading it among many directions.
arXiv Detail & Related papers (2021-10-06T16:56:37Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Multi-label Contrastive Predictive Coding [125.03510235962095]
Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC)
We introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time.
We show that using the same amount of negative samples, multi-label CPC is able to exceed the $log m$ bound, while still being a valid lower bound of mutual information.
arXiv Detail & Related papers (2020-07-20T02:46:21Z) - Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity.
We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.