Information Complexity and Generalization Bounds
- URL: http://arxiv.org/abs/2105.01747v1
- Date: Tue, 4 May 2021 20:37:57 GMT
- Title: Information Complexity and Generalization Bounds
- Authors: Pradeep Kr. Banerjee, Guido Mont\'ufar
- Abstract summary: We show a unifying picture of PAC-Bayesian and mutual information-based upper bounds on randomized learning algorithms.
We discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a unifying picture of PAC-Bayesian and mutual information-based
upper bounds on the generalization error of randomized learning algorithms. As
we show, Tong Zhang's information exponential inequality (IEI) gives a general
recipe for constructing bounds of both flavors. We show that several important
results in the literature can be obtained as simple corollaries of the IEI
under different assumptions on the loss function. Moreover, we obtain new
bounds for data-dependent priors and unbounded loss functions. Optimizing the
bounds gives rise to variants of the Gibbs algorithm, for which we discuss two
practical examples for learning with neural networks, namely, Entropy- and
PAC-Bayes- SGD. Further, we use an Occam's factor argument to show a
PAC-Bayesian bound that incorporates second-order curvature information of the
training loss.
Related papers
- Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets [25.250314934981233]
We first apply the PAC-Bayesian framework on random sets' in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set.
This approach allows us to prove data-dependent bounds, which can be applicable in numerous contexts.
arXiv Detail & Related papers (2024-04-26T14:28:18Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Information-Theoretic Generalization Bounds for Transductive Learning and its Applications [16.408850979966623]
We develop generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayesian theory.
Our theoretic results are validated on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-11-08T09:48:42Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Mitigating the Effect of Incidental Correlations on Part-based Learning [50.682498099720114]
Part-based representations could be more interpretable and generalize better with limited data.
We present two innovative regularization methods for part-based representations.
We exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets.
arXiv Detail & Related papers (2023-09-30T13:44:48Z) - Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - PAC-Bayes unleashed: generalisation bounds with unbounded losses [12.078257783674923]
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions.
This extends the relevance and applicability of the PAC-Bayes learning framework.
arXiv Detail & Related papers (2020-06-12T15:55:46Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.