Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning
Algorithms
- URL: http://arxiv.org/abs/2203.02474v1
- Date: Fri, 4 Mar 2022 18:12:31 GMT
- Title: Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning
Algorithms
- Authors: Milad Sefidgaran, Amin Gohari, Ga\"el Richard, Umut \c{S}im\c{s}ekli
- Abstract summary: We prove novel generalization bounds through the lens of rate-distortion theory.
Our results bring a more unified perspective on generalization and open up several future research directions.
- Score: 12.020634332110147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding generalization in modern machine learning settings has been one
of the major challenges in statistical learning theory. In this context, recent
years have witnessed the development of various generalization bounds
suggesting different complexity notions such as the mutual information between
the data sample and the algorithm output, compressibility of the hypothesis
space, and the fractal dimension of the hypothesis space. While these bounds
have illuminated the problem at hand from different angles, their suggested
complexity notions might appear seemingly unrelated, thereby restricting their
high-level impact. In this study, we prove novel generalization bounds through
the lens of rate-distortion theory, and explicitly relate the concepts of
mutual information, compressibility, and fractal dimensions in a single
mathematical framework. Our approach consists of (i) defining a generalized
notion of compressibility by using source coding concepts, and (ii) showing
that the `compression error rate' can be linked to the generalization error
both in expectation and with high probability. We show that in the `lossless
compression' setting, we recover and improve existing mutual information-based
bounds, whereas a `lossy compression' scheme allows us to link generalization
to the rate-distortion dimension -- a particular notion of fractal dimension.
Our results bring a more unified perspective on generalization and open up
several future research directions.
Related papers
- Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.
We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.
We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning.
We show that systematically controlled metrics are strongly predictive of generalization performance.
This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z) - Generalization Guarantees via Algorithm-dependent Rademacher Complexity [33.408679482592426]
We propose a measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class.
We obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term.
arXiv Detail & Related papers (2023-07-04T18:37:38Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Generalization Bounds with Data-dependent Fractal Dimensions [5.833272638548154]
We prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption.
Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error.
arXiv Detail & Related papers (2023-02-06T13:24:48Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Non-Linear Spectral Dimensionality Reduction Under Uncertainty [107.01839211235583]
We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches.
We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework.
arXiv Detail & Related papers (2022-02-09T19:01:33Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.