Generalization Guarantees via Algorithm-dependent Rademacher Complexity
- URL: http://arxiv.org/abs/2307.02501v1
- Date: Tue, 4 Jul 2023 18:37:38 GMT
- Title: Generalization Guarantees via Algorithm-dependent Rademacher Complexity
- Authors: Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut
Simsekli
- Abstract summary: We propose a measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class.
We obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term.
- Score: 33.408679482592426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithm- and data-dependent generalization bounds are required to explain
the generalization behavior of modern machine learning algorithms. In this
context, there exists information theoretic generalization bounds that involve
(various forms of) mutual information, as well as bounds based on hypothesis
set stability. We propose a conceptually related, but technically distinct
complexity measure to control generalization error, which is the empirical
Rademacher complexity of an algorithm- and data-dependent hypothesis class.
Combining standard properties of Rademacher complexity with the convenient
structure of this class, we are able to (i) obtain novel bounds based on the
finite fractal dimension, which (a) extend previous fractal dimension-type
bounds from continuous to finite hypothesis classes, and (b) avoid a mutual
information term that was required in prior work; (ii) we greatly simplify the
proof of a recent dimension-independent generalization bound for stochastic
gradient descent; and (iii) we easily recover results for VC classes and
compression schemes, similar to approaches based on conditional mutual
information.
Related papers
- On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds [11.30047438005394]
This work investigates the question of how to choose the regularization norm $lVert cdot rVert$ in the context of high-dimensional adversarial training for binary classification.
We quantitatively characterize the relationship between perturbation size and the optimal choice of $lVert cdot rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.
arXiv Detail & Related papers (2024-10-21T14:53:12Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - The Computational Complexity of Concise Hypersphere Classification [49.57441416941195]
This paper is the first complexity-theoretic study of the hypersphere classification problem for binary data.
We use the fine-grained parameterized complexity paradigm to analyze the impact of structural properties that may be present in the input data.
arXiv Detail & Related papers (2023-12-12T09:33:03Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Tensor Completion with Provable Consistency and Fairness Guarantees for
Recommender Systems [5.099537069575897]
We introduce a new consistency-based approach for defining and solving nonnegative/positive matrix and tensor completion problems.
We show that a single property/constraint: preserving unit-scale consistency, guarantees the existence of both a solution and, under relatively weak support assumptions, uniqueness.
arXiv Detail & Related papers (2022-04-04T19:42:46Z) - Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning
Algorithms [12.020634332110147]
We prove novel generalization bounds through the lens of rate-distortion theory.
Our results bring a more unified perspective on generalization and open up several future research directions.
arXiv Detail & Related papers (2022-03-04T18:12:31Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.