Related papers: Minimum Description Length and Generalization Guarantees for Representation Learning

Minimum Description Length and Generalization Guarantees for Representation Learning

URL: http://arxiv.org/abs/2402.03254v1
Date: Mon, 5 Feb 2024 18:12:28 GMT
Title: Minimum Description Length and Generalization Guarantees for Representation Learning
Authors: Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski
Abstract summary: This paper presents a framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm. Rather than the mutual information between the encoder's input and the representation, our new bounds involve the "multi-letter" relative entropy. To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning.
Score: 16.2444595840653
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has spurred much interest, most existing such approaches are heuristic; and very little is known about theoretical generalization guarantees. In this paper, we establish a compressibility framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm in terms of the "Minimum Description Length" (MDL) of the labels or the latent variables (representations). Rather than the mutual information between the encoder's input and the representation, which is often believed to reflect the algorithm's generalization capability in the related literature but in fact, falls short of doing so, our new bounds involve the "multi-letter" relative entropy between the distribution of the representations (or labels) of the training and test sets and a fixed prior. In particular, these new bounds reflect the structure of the encoder and are not vacuous for deterministic algorithms. Our compressibility approach, which is information-theoretic in nature, builds upon that of Blum-Langford for PAC-MDL bounds and introduces two essential ingredients: block-coding and lossy-compression. The latter allows our approach to subsume the so-called geometrical compressibility as a special case. To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning. Finally, we partly exploit the theoretical results by introducing a new data-dependent prior. Numerical simulations illustrate the advantages of well-chosen such priors over classical priors used in IB.

Related papers

An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data. We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z)
Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms. Our bounds offer significant computational and statistical advantages over standard MI bounds. We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation. MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
Deep Equilibrium Assisted Block Sparse Coding of Inter-dependent Signals: Application to Hyperspectral Imaging [71.57324258813675]
A dataset of inter-dependent signals is defined as a matrix whose columns demonstrate strong dependencies. A neural network is employed to act as structure prior and reveal the underlying signal interdependencies. Deep unrolling and Deep equilibrium based algorithms are developed, forming highly interpretable and concise deep-learning-based architectures.
arXiv Detail & Related papers (2022-03-29T21:00:39Z)
Approximability and Generalisation [0.0]
We study the role of approximability in learning, both in the full precision and the approximated settings of the predictor. We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample. We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees.
arXiv Detail & Related papers (2022-03-15T15:21:48Z)
Information Complexity and Generalization Bounds [0.0]
We show a unifying picture of PAC-Bayesian and mutual information-based upper bounds on randomized learning algorithms. We discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD.
arXiv Detail & Related papers (2021-05-04T20:37:57Z)
Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures. Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z)
Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning. We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method. PCL implicitly encodes semantic structures of the data into the learned embedding space. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.