Sharpened Generalization Bounds based on Conditional Mutual Information
and an Application to Noisy, Iterative Algorithms
- URL: http://arxiv.org/abs/2004.12983v2
- Date: Fri, 23 Oct 2020 14:10:41 GMT
- Title: Sharpened Generalization Bounds based on Conditional Mutual Information
and an Application to Noisy, Iterative Algorithms
- Authors: Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M. Roy, Gintare
Karolina Dziugaite
- Abstract summary: We study the proposal, by Steinke and Zakynthinou, to reason about the generalization error of a learning algorithm by introducing a super sample.
We first show that these new bounds based on the conditional mutual information are tighter than those based on unconditional mutual information.
We apply these bounds to the study of Langevin dynamics, showing that conditioning on the super sample allows us to obtain tighter bounds based on hypothesis tests.
- Score: 41.98752845890157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The information-theoretic framework of Russo and J. Zou (2016) and Xu and
Raginsky (2017) provides bounds on the generalization error of a learning
algorithm in terms of the mutual information between the algorithm's output and
the training sample. In this work, we study the proposal, by Steinke and
Zakynthinou (2020), to reason about the generalization error of a learning
algorithm by introducing a super sample that contains the training sample as a
random subset and computing mutual information conditional on the super sample.
We first show that these new bounds based on the conditional mutual information
are tighter than those based on the unconditional mutual information. We then
introduce yet tighter bounds, building on the "individual sample" idea of Bu,
S. Zou, and Veeravalli (2019) and the "data dependent" ideas of Negrea et al.
(2019), using disintegrated mutual information. Finally, we apply these bounds
to the study of Langevin dynamics algorithm, showing that conditioning on the
super sample allows us to exploit information in the optimization trajectory to
obtain tighter bounds based on hypothesis tests.
Related papers
- An Information-Theoretic Approach to Generalization Theory [27.87324770020133]
We analyze information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.
We show that algorithms with a bounded maximal leakage guarantee generalization even with a constant privacy parameter.
arXiv Detail & Related papers (2024-08-20T10:08:21Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - Tighter Information-Theoretic Generalization Bounds from Supersamples [27.14107452619853]
We present a variety of novel information-theoretic generalization bounds for learning algorithms.
The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness.
We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.
arXiv Detail & Related papers (2023-02-05T17:06:27Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization [48.12845778927164]
We consider the prospect of establishing minimax rates for gradient descent in the setting of convex optimization.
We consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm.
Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
arXiv Detail & Related papers (2022-12-27T17:16:48Z) - Generalization Bounds via Information Density and Conditional
Information Density [14.147617330278662]
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
arXiv Detail & Related papers (2020-05-16T17:04:24Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.