Class-wise Generalization Error: an Information-Theoretic Analysis
- URL: http://arxiv.org/abs/2401.02904v1
- Date: Fri, 5 Jan 2024 17:05:14 GMT
- Title: Class-wise Generalization Error: an Information-Theoretic Analysis
- Authors: Firas Laakom, Yuheng Bu, Moncef Gabbouj
- Abstract summary: We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
- Score: 22.877440350595222
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing generalization theories of supervised learning typically take a
holistic approach and provide bounds for the expected generalization over the
whole data distribution, which implicitly assumes that the model generalizes
similarly for all the classes. In practice, however, there are significant
variations in generalization performance among different classes, which cannot
be captured by the existing generalization bounds. In this work, we tackle this
problem by theoretically studying the class-generalization error, which
quantifies the generalization performance of each individual class. We derive a
novel information-theoretic bound for class-generalization error using the KL
divergence, and we further obtain several tighter bounds using the conditional
mutual information (CMI), which are significantly easier to estimate in
practice. We empirically validate our proposed bounds in different neural
networks and show that they accurately capture the complex class-generalization
error behavior. Moreover, we show that the theoretical tools developed in this
paper can be applied in several applications beyond this context.
Related papers
- A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - A Unified Approach to Controlling Implicit Regularization via Mirror
Descent [18.536453909759544]
Mirror descent (MD) is a notable generalization of gradient descent (GD)
We show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions.
arXiv Detail & Related papers (2023-06-24T03:57:26Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Information-Theoretic Bounds on the Moments of the Generalization Error
of Learning Algorithms [19.186110989897738]
Generalization error bounds are critical to understanding the performance of machine learning models.
We offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments.
arXiv Detail & Related papers (2021-02-03T11:38:00Z) - In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk.
When evaluated empirically, most of these bounds are numerically vacuous.
We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z) - Towards a generalization of information theory for hierarchical
partitions [0.0]
We introduce a generalization of information theory that works with hierarchical partitions.
In particular, we derive hierarchical generalizations of many other classical information-theoretic quantities.
arXiv Detail & Related papers (2020-02-27T11:47:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.