Understanding Generalization via Leave-One-Out Conditional Mutual
Information
- URL: http://arxiv.org/abs/2206.14800v1
- Date: Wed, 29 Jun 2022 17:57:37 GMT
- Title: Understanding Generalization via Leave-One-Out Conditional Mutual
Information
- Authors: Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite
- Abstract summary: leave-one-out variants of the conditional mutual information (CMI) of an algorithm are seen to control the mean generalization error of learning algorithms with bounded loss functions.
For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk.
- Score: 37.49575289458763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the mutual information between (certain summaries of) the output of
a learning algorithm and its $n$ training data, conditional on a supersample of
$n+1$ i.i.d. data from which the training data is chosen at random without
replacement. These leave-one-out variants of the conditional mutual information
(CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control
the mean generalization error of learning algorithms with bounded loss
functions. For learning algorithms achieving zero empirical risk under 0-1 loss
(i.e., interpolating algorithms), we provide an explicit connection between
leave-one-out CMI and the classical leave-one-out error estimate of the risk.
Using this connection, we obtain upper and lower bounds on risk in terms of the
(evaluated) leave-one-out CMI. When the limiting risk is constant or decays
polynomially, the bounds converge to within a constant factor of two. As an
application, we analyze the population risk of the one-inclusion graph
algorithm, a general-purpose transductive learning algorithm for VC classes in
the realizable setting. Using leave-one-out CMI, we match the optimal bound for
learning VC classes in the realizable setting, answering an open challenge
raised by Steinke and Zakynthinou (2020). Finally, in order to understand the
role of leave-one-out CMI in studying generalization, we place leave-one-out
CMI in a hierarchy of measures, with a novel unconditional mutual information
at the root. For 0-1 loss and interpolating learning algorithms, this mutual
information is observed to be precisely the risk.
Related papers
- On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery [88.97076030698433]
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery.
CIC explicitly incentivizes diverse behaviors by maximizing state entropy.
We find that CIC substantially improves over prior unsupervised skill discovery methods.
arXiv Detail & Related papers (2022-02-01T00:36:29Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Towards a Unified Information-Theoretic Framework for Generalization [43.23033319683591]
We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm.
We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces.
arXiv Detail & Related papers (2021-11-09T17:09:40Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Generalization Bounds via Information Density and Conditional
Information Density [14.147617330278662]
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
arXiv Detail & Related papers (2020-05-16T17:04:24Z) - Reasoning About Generalization via Conditional Mutual Information [26.011933885798506]
We use Mutual Information (CMI) to quantify how well the input can be recognized.
We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.
We then show that bounded CMI implies various forms of generalization.
arXiv Detail & Related papers (2020-01-24T18:13:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.