Learning via Wasserstein-Based High Probability Generalisation Bounds
- URL: http://arxiv.org/abs/2306.04375v2
- Date: Fri, 27 Oct 2023 08:08:56 GMT
- Title: Learning via Wasserstein-Based High Probability Generalisation Bounds
- Authors: Paul Viallard, Maxime Haddouche, Umut \c{S}im\c{s}ekli, Benjamin Guedj
- Abstract summary: Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation.
Most bounds involve a Kullback-Leibler (KL) divergence term (or its variations)
Recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance.
- Score: 16.74864438507713
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Minimising upper bounds on the population risk or the generalisation gap has
been widely used in structural risk minimisation (SRM) -- this is in particular
at the core of PAC-Bayesian learning. Despite its successes and unfailing surge
of interest in recent years, a limitation of the PAC-Bayesian framework is that
most bounds involve a Kullback-Leibler (KL) divergence term (or its
variations), which might exhibit erratic behavior and fail to capture the
underlying geometric structure of the learning problem -- hence restricting its
use in practical applications. As a remedy, recent studies have attempted to
replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein
distance. Even though these bounds alleviated the aforementioned issues to a
certain extent, they either hold in expectation, are for bounded losses, or are
nontrivial to minimize in an SRM framework. In this work, we contribute to this
line of research and prove novel Wasserstein distance-based PAC-Bayesian
generalisation bounds for both batch learning with independent and identically
distributed (i.i.d.) data, and online learning with potentially non-i.i.d.
data. Contrary to previous art, our bounds are stronger in the sense that (i)
they hold with high probability, (ii) they apply to unbounded (potentially
heavy-tailed) losses, and (iii) they lead to optimizable training objectives
that can be used in SRM. As a result we derive novel Wasserstein-based
PAC-Bayesian learning algorithms and we illustrate their empirical advantage on
a variety of experiments.
Related papers
- It begins with a boundary: A geometric view on probabilistically robust
learning [2.0388938295521575]
We take a fresh and geometric view on one such method -- Probabilistically Robust Learning (PRL)
We propose a geometric framework for understanding PRL, which allows us to identify a subtle flaw in its original formulation.
We prove existence of solutions using novel relaxation methods and study properties as well as local limits of the introduced perimeters.
arXiv Detail & Related papers (2023-05-30T06:24:30Z) - An Information-Theoretic Analysis of Bayesian Reinforcement Learning [44.025369660607645]
We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent.
We show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy.
arXiv Detail & Related papers (2022-07-18T16:28:01Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Offline Stochastic Shortest Path: Learning, Evaluation and Towards
Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite.
We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks.
Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Constrained Learning with Non-Convex Losses [119.8736858597118]
Though learning has become a core technology of modern information processing, there is now ample evidence that it can lead to biased, unsafe, and prejudiced solutions.
arXiv Detail & Related papers (2021-03-08T23:10:33Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - PAC-Bayes Analysis Beyond the Usual Bounds [16.76187007910588]
We focus on a learning model where the learner observes a finite set of training examples.
The learned data-dependent distribution is then used to make randomized predictions.
arXiv Detail & Related papers (2020-06-23T14:30:24Z) - PAC-Bayes unleashed: generalisation bounds with unbounded losses [12.078257783674923]
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions.
This extends the relevance and applicability of the PAC-Bayes learning framework.
arXiv Detail & Related papers (2020-06-12T15:55:46Z) - Probably Approximately Correct Constrained Learning [135.48447120228658]
We develop a generalization theory based on the probably approximately correct (PAC) learning framework.
We show that imposing a learner does not make a learning problem harder in the sense that any PAC learnable class is also a constrained learner.
We analyze the properties of this solution and use it to illustrate how constrained learning can address problems in fair and robust classification.
arXiv Detail & Related papers (2020-06-09T19:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.