Minimum Description Length and Generalization Guarantees for
Representation Learning
- URL: http://arxiv.org/abs/2402.03254v1
- Date: Mon, 5 Feb 2024 18:12:28 GMT
- Title: Minimum Description Length and Generalization Guarantees for
Representation Learning
- Authors: Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski
- Abstract summary: This paper presents a framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm.
Rather than the mutual information between the encoder's input and the representation, our new bounds involve the "multi-letter" relative entropy.
To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning.
- Score: 16.2444595840653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A major challenge in designing efficient statistical supervised learning
algorithms is finding representations that perform well not only on available
training samples but also on unseen data. While the study of representation
learning has spurred much interest, most existing such approaches are
heuristic; and very little is known about theoretical generalization
guarantees.
In this paper, we establish a compressibility framework that allows us to
derive upper bounds on the generalization error of a representation learning
algorithm in terms of the "Minimum Description Length" (MDL) of the labels or
the latent variables (representations). Rather than the mutual information
between the encoder's input and the representation, which is often believed to
reflect the algorithm's generalization capability in the related literature but
in fact, falls short of doing so, our new bounds involve the "multi-letter"
relative entropy between the distribution of the representations (or labels) of
the training and test sets and a fixed prior. In particular, these new bounds
reflect the structure of the encoder and are not vacuous for deterministic
algorithms. Our compressibility approach, which is information-theoretic in
nature, builds upon that of Blum-Langford for PAC-MDL bounds and introduces two
essential ingredients: block-coding and lossy-compression. The latter allows
our approach to subsume the so-called geometrical compressibility as a special
case. To the best knowledge of the authors, the established generalization
bounds are the first of their kind for Information Bottleneck (IB) type
encoders and representation learning. Finally, we partly exploit the
theoretical results by introducing a new data-dependent prior. Numerical
simulations illustrate the advantages of well-chosen such priors over classical
priors used in IB.
Related papers
- An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.
We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation.
MEC learns a more generalizable representation than previous methods based on specific pretext tasks.
It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Deep Equilibrium Assisted Block Sparse Coding of Inter-dependent
Signals: Application to Hyperspectral Imaging [71.57324258813675]
A dataset of inter-dependent signals is defined as a matrix whose columns demonstrate strong dependencies.
A neural network is employed to act as structure prior and reveal the underlying signal interdependencies.
Deep unrolling and Deep equilibrium based algorithms are developed, forming highly interpretable and concise deep-learning-based architectures.
arXiv Detail & Related papers (2022-03-29T21:00:39Z) - Approximability and Generalisation [0.0]
We study the role of approximability in learning, both in the full precision and the approximated settings of the predictor.
We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample.
We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees.
arXiv Detail & Related papers (2022-03-15T15:21:48Z) - Information Complexity and Generalization Bounds [0.0]
We show a unifying picture of PAC-Bayesian and mutual information-based upper bounds on randomized learning algorithms.
We discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD.
arXiv Detail & Related papers (2021-05-04T20:37:57Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.