Related papers: Learning via Wasserstein-Based High Probability Generalisation Bounds

Learning via Wasserstein-Based High Probability Generalisation Bounds

URL: http://arxiv.org/abs/2306.04375v2
Date: Fri, 27 Oct 2023 08:08:56 GMT
Title: Learning via Wasserstein-Based High Probability Generalisation Bounds
Authors: Paul Viallard, Maxime Haddouche, Umut \c{S}im\c{s}ekli, Benjamin Guedj
Abstract summary: Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation. Most bounds involve a Kullback-Leibler (KL) divergence term (or its variations) Recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance.
Score: 16.74864438507713
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) -- this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), which might exhibit erratic behavior and fail to capture the underlying geometric structure of the learning problem -- hence restricting its use in practical applications. As a remedy, recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance. Even though these bounds alleviated the aforementioned issues to a certain extent, they either hold in expectation, are for bounded losses, or are nontrivial to minimize in an SRM framework. In this work, we contribute to this line of research and prove novel Wasserstein distance-based PAC-Bayesian generalisation bounds for both batch learning with independent and identically distributed (i.i.d.) data, and online learning with potentially non-i.i.d. data. Contrary to previous art, our bounds are stronger in the sense that (i) they hold with high probability, (ii) they apply to unbounded (potentially heavy-tailed) losses, and (iii) they lead to optimizable training objectives that can be used in SRM. As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments.

Related papers

Multi-View Majority Vote Learning Algorithms: Direct Minimization of PAC-Bayesian Bounds [0.8039067099377079]
We extend PAC-Bayesian theory to multi-view learning, introducing novel generalization bounds based on R'enyi divergence. These bounds provide an alternative to traditional Kullback-Leibler divergence-based counterparts, leveraging the flexibility of R'enyi divergence. We also propose first- and second-order oracle PAC-Bayesian bounds and extend the C-bound to multi-view settings.
arXiv Detail & Related papers (2024-11-09T20:25:47Z)
PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting. Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z)
An Information-Theoretic Analysis of Bayesian Reinforcement Learning [44.025369660607645]
We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent. We show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy.
arXiv Detail & Related papers (2022-07-18T16:28:01Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite. We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks. Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z)
Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$. In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially. Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z)
Constrained Learning with Non-Convex Losses [119.8736858597118]
Though learning has become a core technology of modern information processing, there is now ample evidence that it can lead to biased, unsafe, and prejudiced solutions.
arXiv Detail & Related papers (2021-03-08T23:10:33Z)
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z)
PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples. The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z)
PAC-Bayes unleashed: generalisation bounds with unbounded losses [12.078257783674923]
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework.
arXiv Detail & Related papers (2020-06-12T15:55:46Z)
Probably Approximately Correct Constrained Learning [135.48447120228658]
We develop a generalization theory based on the probably approximately correct (PAC) learning framework. We show that imposing a learner does not make a learning problem harder in the sense that any PAC learnable class is also a constrained learner. We analyze the properties of this solution and use it to illustrate how constrained learning can address problems in fair and robust classification.
arXiv Detail & Related papers (2020-06-09T19:59:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.