Related papers: Towards a Unified Information-Theoretic Framework for Generalization

Towards a Unified Information-Theoretic Framework for Generalization

URL: http://arxiv.org/abs/2111.05275v1
Date: Tue, 9 Nov 2021 17:09:40 GMT
Title: Towards a Unified Information-Theoretic Framework for Generalization
Authors: Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy
Abstract summary: We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm. We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces.
Score: 43.23033319683591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces. This result is an application of our general result showing that stable compression schemes Bousquet al. (2020) of size $k$ have uniformly bounded CMI of order $O(k)$. We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020). We further study the CMI of empirical risk minimizers (ERMs) of class $H$ and show that it is possible to output all consistent classifiers (version space) with bounded CMI if and only if $H$ has a bounded star number (Hanneke and Yang (2015)). Moreover, we prove a general reduction showing that "leave-one-out" analysis is expressible via the CMI framework. As a corollary we investigate the CMI of the one-inclusion-graph algorithm proposed by Haussler et al. (1994). More generally, we show that the CMI framework is universal in the sense that for every consistent algorithm and data distribution, the expected risk vanishes as the number of samples diverges if and only if its evaluated CMI has sublinear growth with the number of samples.

Related papers

On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning. We propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS.
arXiv Detail & Related papers (2024-10-11T18:02:46Z)
Memetic Differential Evolution Methods for Semi-Supervised Clustering [0.8681835475119588]
We propose an extension for semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems of MDEClust. Our new framework, called S-MDEClust, represents the first memetic methodology designed to generate an optimal feasible solution.
arXiv Detail & Related papers (2024-03-07T08:37:36Z)
Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization [36.28082578445787]
We investigate the interplay between memorization and learning in the context of emphstochastic convex optimization (SCO) We quantify memorization using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou ( 2020) We show that, in the $L2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error has CMI bounded below by $Omega (1/varepsilon2)$ and $Omega (1/varepsilon)$, respectively.
arXiv Detail & Related papers (2024-02-14T17:17:30Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness [14.147617330278662]
We present novel generalization bounds for meta learning in terms of the evaluated CMI (e-CMI) We show that the e-CMI framework yields a bound that scales as $sqrt mathcal C(mathcal H)/(nhat n) + mathcal C(mathcal F)/n $, where $mathcal C(cdot)$ denotes a complexity measure of the hypothesis class.
arXiv Detail & Related papers (2022-10-12T18:10:59Z)
Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy. We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
Understanding Generalization via Leave-One-Out Conditional Mutual Information [37.49575289458763]
leave-one-out variants of the conditional mutual information (CMI) of an algorithm are seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk.
arXiv Detail & Related papers (2022-06-29T17:57:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.