Tighter Information-Theoretic Generalization Bounds from Supersamples
- URL: http://arxiv.org/abs/2302.02432v3
- Date: Thu, 15 Jun 2023 05:46:14 GMT
- Title: Tighter Information-Theoretic Generalization Bounds from Supersamples
- Authors: Ziqiao Wang, Yongyi Mao
- Abstract summary: We present a variety of novel information-theoretic generalization bounds for learning algorithms.
The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness.
We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.
- Score: 27.14107452619853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present a variety of novel information-theoretic
generalization bounds for learning algorithms, from the supersample setting of
Steinke & Zakynthinou (2020)-the setting of the "conditional mutual
information" framework. Our development exploits projecting the loss pair
(obtained from a training instance and a testing instance) down to a single
number and correlating loss values with a Rademacher sequence (and its shifted
variants). The presented bounds include square-root bounds, fast-rate bounds,
including those based on variance and sharpness, and bounds for interpolating
algorithms etc. We show theoretically or empirically that these bounds are
tighter than all information-theoretic bounds known to date on the same
supersample setting.
Related papers
- Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Fast Rate Information-theoretic Bounds on Generalization Errors [8.102199960821165]
The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data.
Various information-theoretic bounds on the generalization error have been derived in the literature.
This paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size.
arXiv Detail & Related papers (2023-03-26T08:59:05Z) - Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization [48.12845778927164]
We consider the prospect of establishing minimax rates for gradient descent in the setting of convex optimization.
We consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm.
Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
arXiv Detail & Related papers (2022-12-27T17:16:48Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - Are Missing Links Predictable? An Inferential Benchmark for Knowledge
Graph Completion [79.07695173192472]
InferWiki improves upon existing benchmarks in inferential ability, assumptions, and patterns.
Each testing sample is predictable with supportive data in the training set.
In experiments, we curate two settings of InferWiki varying in sizes and structures, and apply the construction process on CoDEx as comparative datasets.
arXiv Detail & Related papers (2021-08-03T09:51:15Z) - Tighter expected generalization error bounds via Wasserstein distance [23.52237892358981]
We introduce several expected generalization error bounds based on the Wasserstein distance.
We present full-dataset, single-letter, and random-subset bounds on both the standard setting and the randomized-subsample setting.
We show how various new bounds based on different information measures can be derived from the presented bounds.
arXiv Detail & Related papers (2021-01-22T20:13:59Z) - Generalization Bounds via Information Density and Conditional
Information Density [14.147617330278662]
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
arXiv Detail & Related papers (2020-05-16T17:04:24Z) - Sharpened Generalization Bounds based on Conditional Mutual Information
and an Application to Noisy, Iterative Algorithms [41.98752845890157]
We study the proposal, by Steinke and Zakynthinou, to reason about the generalization error of a learning algorithm by introducing a super sample.
We first show that these new bounds based on the conditional mutual information are tighter than those based on unconditional mutual information.
We apply these bounds to the study of Langevin dynamics, showing that conditioning on the super sample allows us to obtain tighter bounds based on hypothesis tests.
arXiv Detail & Related papers (2020-04-27T17:51:09Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z) - Domain Adaptation: Learning Bounds and Algorithms [80.85426994513541]
We introduce a novel distance between distributions, discrepancy distance, that is tailored to adaptation problems with arbitrary loss functions.
We derive novel generalization bounds for domain adaptation for a wide family of loss functions.
We also present a series of novel adaptation bounds for large classes of regularization-based algorithms.
arXiv Detail & Related papers (2009-02-19T18:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.