Related papers: Information Complexity and Generalization Bounds

Information Complexity and Generalization Bounds

URL: http://arxiv.org/abs/2105.01747v1
Date: Tue, 4 May 2021 20:37:57 GMT
Title: Information Complexity and Generalization Bounds
Authors: Pradeep Kr. Banerjee, Guido Mont\'ufar
Abstract summary: We show a unifying picture of PAC-Bayesian and mutual information-based upper bounds on randomized learning algorithms. We discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a unifying picture of PAC-Bayesian and mutual information-based upper bounds on the generalization error of randomized learning algorithms. As we show, Tong Zhang's information exponential inequality (IEI) gives a general recipe for constructing bounds of both flavors. We show that several important results in the literature can be obtained as simple corollaries of the IEI under different assumptions on the loss function. Moreover, we obtain new bounds for data-dependent priors and unbounded loss functions. Optimizing the bounds gives rise to variants of the Gibbs algorithm, for which we discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD. Further, we use an Occam's factor argument to show a PAC-Bayesian bound that incorporates second-order curvature information of the training loss.

Related papers

A DPI-PAC-Bayesian Framework for Generalization Bounds [11.922046341518278]
We develop a unified Data Processing Inequality PAC-Bayesian framework -- abbreviated DPI-PAC-Bayesian.<n>We obtain explicit bounds on the binary Kullback-Leibler generalization gap for both R'enyi divergence and any $f$-divergence measured between a data-independent prior distribution and an algorithm-dependent posterior distribution.
arXiv Detail & Related papers (2025-07-20T02:55:15Z)
Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms. Our bounds offer significant computational and statistical advantages over standard MI bounds. We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z)
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets [25.250314934981233]
We first apply the PAC-Bayesian framework on random sets' in a rigorous way, where the training algorithm is assumed to output a data-dependent hypothesis set. This approach allows us to prove data-dependent bounds, which can be applicable in numerous contexts.
arXiv Detail & Related papers (2024-04-26T14:28:18Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Information-Theoretic Generalization Bounds for Transductive Learning and its Applications [16.408850979966623]
We develop generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayesian theory. Our theoretic results are validated on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-11-08T09:48:42Z)
Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z)
Mitigating the Effect of Incidental Correlations on Part-based Learning [50.682498099720114]
Part-based representations could be more interpretable and generalize better with limited data. We present two innovative regularization methods for part-based representations. We exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets.
arXiv Detail & Related papers (2023-09-30T13:44:48Z)
Information Theoretic Lower Bounds for Information Theoretic Upper Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization. Our study reveals that, for true risk minimization, mutual information is necessary. Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
PAC-Bayes unleashed: generalisation bounds with unbounded losses [12.078257783674923]
We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework.
arXiv Detail & Related papers (2020-06-12T15:55:46Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.