On Leave-One-Out Conditional Mutual Information For Generalization
- URL: http://arxiv.org/abs/2207.00581v1
- Date: Fri, 1 Jul 2022 17:58:29 GMT
- Title: On Leave-One-Out Conditional Mutual Information For Generalization
- Authors: Mohamad Rida Rammal, Alessandro Achille, Aditya Golatkar, Suhas
Diggavi, Stefano Soatto
- Abstract summary: We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
- Score: 122.2734338600665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We derive information theoretic generalization bounds for supervised learning
algorithms based on a new measure of leave-one-out conditional mutual
information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds
that do not exploit the structure of the problem and may be hard to evaluate in
practice, our loo-CMI bounds can be computed easily and can be interpreted in
connection to other notions such as classical leave-one-out cross-validation,
stability of the optimization algorithm, and the geometry of the
loss-landscape. It applies both to the output of training algorithms as well as
their predictions. We empirically validate the quality of the bound by
evaluating its predicted generalization gap in scenarios for deep learning. In
particular, our bounds are non-vacuous on large-scale image-classification
tasks.
Related papers
- On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.
We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Understanding Generalization via Leave-One-Out Conditional Mutual
Information [37.49575289458763]
leave-one-out variants of the conditional mutual information (CMI) of an algorithm are seen to control the mean generalization error of learning algorithms with bounded loss functions.
For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk.
arXiv Detail & Related papers (2022-06-29T17:57:37Z) - CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery [88.97076030698433]
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery.
CIC explicitly incentivizes diverse behaviors by maximizing state entropy.
We find that CIC substantially improves over prior unsupervised skill discovery methods.
arXiv Detail & Related papers (2022-02-01T00:36:29Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Reasoning About Generalization via Conditional Mutual Information [26.011933885798506]
We use Mutual Information (CMI) to quantify how well the input can be recognized.
We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.
We then show that bounded CMI implies various forms of generalization.
arXiv Detail & Related papers (2020-01-24T18:13:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.