Reasoning About Generalization via Conditional Mutual Information
- URL: http://arxiv.org/abs/2001.09122v3
- Date: Fri, 19 Jun 2020 00:42:03 GMT
- Title: Reasoning About Generalization via Conditional Mutual Information
- Authors: Thomas Steinke and Lydia Zakynthinou
- Abstract summary: We use Mutual Information (CMI) to quantify how well the input can be recognized.
We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.
We then show that bounded CMI implies various forms of generalization.
- Score: 26.011933885798506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide an information-theoretic framework for studying the generalization
properties of machine learning algorithms. Our framework ties together existing
approaches, including uniform convergence bounds and recent methods for
adaptive data analysis. Specifically, we use Conditional Mutual Information
(CMI) to quantify how well the input (i.e., the training data) can be
recognized given the output (i.e., the trained model) of the learning
algorithm. We show that bounds on CMI can be obtained from VC dimension,
compression schemes, differential privacy, and other methods. We then show that
bounded CMI implies various forms of generalization.
Related papers
- On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - Methods for Recovering Conditional Independence Graphs: A Survey [2.2721854258621064]
Conditional Independence (CI) graphs are used to gain insights about feature relationships.
We list out different methods and study the advances in techniques developed to recover CI graphs.
arXiv Detail & Related papers (2022-11-13T06:11:38Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Generalization Bounds For Meta-Learning: An Information-Theoretic
Analysis [8.028776552383365]
We propose a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning algorithms.
We provide a data-dependent generalization bound for a variant of MAML, which is non-vacuous for deep few-shot learning.
arXiv Detail & Related papers (2021-09-29T17:45:54Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision.
We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z) - Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.