Robust Generalization via $\alpha$-Mutual Information
- URL: http://arxiv.org/abs/2001.06399v1
- Date: Tue, 14 Jan 2020 11:28:30 GMT
- Title: Robust Generalization via $\alpha$-Mutual Information
- Authors: Amedeo Roberto Esposito, Michael Gastpar, Ibrahim Issa
- Abstract summary: bounds connecting two probability measures of the same event using R'enyi $alpha$-Divergences and Sibson's $alpha$-Mutual Information.
Results have broad applications, from bounding the generalization error of learning algorithms to the more general framework of adaptive data analysis.
- Score: 24.40306100502023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of this work is to provide bounds connecting two probability measures
of the same event using R\'enyi $\alpha$-Divergences and Sibson's
$\alpha$-Mutual Information, a generalization of respectively the
Kullback-Leibler Divergence and Shannon's Mutual Information. A particular case
of interest can be found when the two probability measures considered are a
joint distribution and the corresponding product of marginals (representing the
statistically independent scenario). In this case, a bound using Sibson's
$\alpha-$Mutual Information is retrieved, extending a result involving Maximal
Leakage to general alphabets. These results have broad applications, from
bounding the generalization error of learning algorithms to the more general
framework of adaptive data analysis, provided that the divergences and/or
information measures used are amenable to such an analysis ({\it i.e.,} are
robust to post-processing and compose adaptively). The generalization error
bounds are derived with respect to high-probability events but a corresponding
bound on expected generalization error is also retrieved.
Related papers
- Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Data-dependent Generalization Bounds via Variable-Size Compressibility [16.2444595840653]
We establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework.
In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data.
Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds.
arXiv Detail & Related papers (2023-03-09T16:17:45Z) - Tighter Expected Generalization Error Bounds via Convexity of
Information Measures [8.359770027722275]
Generalization error bounds are essential to understanding machine learning algorithms.
This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample.
arXiv Detail & Related papers (2022-02-24T15:36:09Z) - Generalization Bounds via Convex Analysis [12.411844611718958]
We show that it is possible to replace the mutual information by any strongly convex function of the joint input-output distribution.
Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance.
arXiv Detail & Related papers (2022-02-10T12:30:45Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Generic Error Bounds for the Generalized Lasso with Sub-Exponential Data [4.56877715768796]
This work performs a non-asymptotic analysis of the generalized Lasso under the assumption of sub-exponential data.
We show that the estimation error can be controlled by means of two complexity parameters that arise naturally from a generic-chaining-based proof strategy.
arXiv Detail & Related papers (2020-04-11T10:39:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.