Generalization Bounds via Information Density and Conditional
Information Density
- URL: http://arxiv.org/abs/2005.08044v5
- Date: Fri, 19 Mar 2021 09:12:12 GMT
- Title: Generalization Bounds via Information Density and Conditional
Information Density
- Authors: Fredrik Hellstr\"om and Giuseppe Durisi
- Abstract summary: We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
- Score: 14.147617330278662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a general approach, based on an exponential inequality, to derive
bounds on the generalization error of randomized learning algorithms. Using
this approach, we provide bounds on the average generalization error as well as
bounds on its tail probability, for both the PAC-Bayesian and single-draw
scenarios. Specifically, for the case of subgaussian loss functions, we obtain
novel bounds that depend on the information density between the training data
and the output hypothesis. When suitably weakened, these bounds recover many of
the information-theoretic available bounds in the literature. We also extend
the proposed exponential-inequality approach to the setting recently introduced
by Steinke and Zakynthinou (2020), where the learning algorithm depends on a
randomly selected subset of the available training data. For this setup, we
present bounds for bounded loss functions in terms of the conditional
information density between the output hypothesis and the random variable
determining the subset choice, given all training data. Through our approach,
we recover the average generalization bound presented by Steinke and
Zakynthinou (2020) and extend it to the PAC-Bayesian and single-draw scenarios.
For the single-draw scenario, we also obtain novel bounds in terms of the
conditional $\alpha$-mutual information and the conditional maximal leakage.
Related papers
- An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.
We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z) - Data-dependent Generalization Bounds via Variable-Size Compressibility [16.2444595840653]
We establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework.
In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data.
Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds.
arXiv Detail & Related papers (2023-03-09T16:17:45Z) - Tighter Information-Theoretic Generalization Bounds from Supersamples [27.14107452619853]
We present a variety of novel information-theoretic generalization bounds for learning algorithms.
The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness.
We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.
arXiv Detail & Related papers (2023-02-05T17:06:27Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization [48.12845778927164]
We consider the prospect of establishing minimax rates for gradient descent in the setting of convex optimization.
We consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm.
Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
arXiv Detail & Related papers (2022-12-27T17:16:48Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - A Discriminative Technique for Multiple-Source Adaptation [55.5865665284915]
We present a new discriminative technique for the multiple-source adaptation, MSA, problem.
Our solution only requires conditional probabilities that can easily be accurately estimated from unlabeled data from the source domains.
Our experiments with real-world applications further demonstrate that our new discriminative MSA algorithm outperforms the previous generative solution.
arXiv Detail & Related papers (2020-08-25T14:06:15Z) - Sharpened Generalization Bounds based on Conditional Mutual Information
and an Application to Noisy, Iterative Algorithms [41.98752845890157]
We study the proposal, by Steinke and Zakynthinou, to reason about the generalization error of a learning algorithm by introducing a super sample.
We first show that these new bounds based on the conditional mutual information are tighter than those based on unconditional mutual information.
We apply these bounds to the study of Langevin dynamics, showing that conditioning on the super sample allows us to obtain tighter bounds based on hypothesis tests.
arXiv Detail & Related papers (2020-04-27T17:51:09Z) - Generalization Error Bounds via $m$th Central Moments of the Information
Density [14.147617330278662]
We present a general approach to deriving bounds on the generalization error of randomized learning algorithms.
Our approach can be used to obtain bounds on the average generalization error as well as bounds on its tail probabilities.
arXiv Detail & Related papers (2020-04-20T09:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.