Towards a Unified Information-Theoretic Framework for Generalization
- URL: http://arxiv.org/abs/2111.05275v1
- Date: Tue, 9 Nov 2021 17:09:40 GMT
- Title: Towards a Unified Information-Theoretic Framework for Generalization
- Authors: Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy
- Abstract summary: We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm.
We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces.
- Score: 43.23033319683591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we investigate the expressiveness of the "conditional mutual
information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect
of using it to provide a unified framework for proving generalization bounds in
the realizable setting. We first demonstrate that one can use this framework to
express non-trivial (but sub-optimal) bounds for any learning algorithm that
outputs hypotheses from a class of bounded VC dimension. We prove that the CMI
framework yields the optimal bound on the expected risk of Support Vector
Machines (SVMs) for learning halfspaces. This result is an application of our
general result showing that stable compression schemes Bousquet al. (2020) of
size $k$ have uniformly bounded CMI of order $O(k)$. We further show that an
inherent limitation of proper learning of VC classes contradicts the existence
of a proper learner with constant CMI, and it implies a negative resolution to
an open problem of Steinke and Zakynthinou (2020). We further study the CMI of
empirical risk minimizers (ERMs) of class $H$ and show that it is possible to
output all consistent classifiers (version space) with bounded CMI if and only
if $H$ has a bounded star number (Hanneke and Yang (2015)). Moreover, we prove
a general reduction showing that "leave-one-out" analysis is expressible via
the CMI framework. As a corollary we investigate the CMI of the
one-inclusion-graph algorithm proposed by Haussler et al. (1994). More
generally, we show that the CMI framework is universal in the sense that for
every consistent algorithm and data distribution, the expected risk vanishes as
the number of samples diverges if and only if its evaluated CMI has sublinear
growth with the number of samples.
Related papers
- Memetic Differential Evolution Methods for Semi-Supervised Clustering [0.8681835475119588]
We propose an extension for semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems of MDEClust.
Our new framework, called S-MDEClust, represents the first memetic methodology designed to generate an optimal feasible solution.
arXiv Detail & Related papers (2024-03-07T08:37:36Z) - Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization [36.28082578445787]
We investigate the interplay between memorization and learning in the context of emphstochastic convex optimization (SCO)
We quantify memorization using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou ( 2020)
We show that, in the $L2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error has CMI bounded below by $Omega (1/varepsilon2)$ and $Omega (1/varepsilon)$, respectively.
arXiv Detail & Related papers (2024-02-14T17:17:30Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness [14.147617330278662]
We present novel generalization bounds for meta learning in terms of the evaluated CMI (e-CMI)
We show that the e-CMI framework yields a bound that scales as $sqrt mathcal C(mathcal H)/(nhat n) + mathcal C(mathcal F)/n $, where $mathcal C(cdot)$ denotes a complexity measure of the hypothesis class.
arXiv Detail & Related papers (2022-10-12T18:10:59Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Understanding Generalization via Leave-One-Out Conditional Mutual
Information [37.49575289458763]
leave-one-out variants of the conditional mutual information (CMI) of an algorithm are seen to control the mean generalization error of learning algorithms with bounded loss functions.
For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk.
arXiv Detail & Related papers (2022-06-29T17:57:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.