Generalization Bounds via Convex Analysis
- URL: http://arxiv.org/abs/2202.04985v1
- Date: Thu, 10 Feb 2022 12:30:45 GMT
- Title: Generalization Bounds via Convex Analysis
- Authors: Gergely Neu, G\'abor Lugosi
- Abstract summary: We show that it is possible to replace the mutual information by any strongly convex function of the joint input-output distribution.
Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance.
- Score: 12.411844611718958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky
(2017), it has been well known that the generalization error of supervised
learning algorithms can be bounded in terms of the mutual information between
their input and the output, given that the loss of any fixed hypothesis has a
subgaussian tail. In this work, we generalize this result beyond the standard
choice of Shannon's mutual information to measure the dependence between the
input and the output. Our main result shows that it is indeed possible to
replace the mutual information by any strongly convex function of the joint
input-output distribution, with the subgaussianity condition on the losses
replaced by a bound on an appropriately chosen norm capturing the geometry of
the dependence measure. This allows us to derive a range of generalization
bounds that are either entirely new or strengthen previously known ones.
Examples include bounds stated in terms of $p$-norm divergences and the
Wasserstein-2 distance, which are respectively applicable for heavy-tailed loss
distributions and highly smooth loss functions. Our analysis is entirely based
on elementary tools from convex analysis by tracking the growth of a potential
function associated with the dependence measure and the loss function.
Related papers
- General Loss Functions Lead to (Approximate) Interpolation in High
Dimensions [6.738946307589741]
We provide a unified framework to approximately characterize the implicit bias of gradient descent in closed form.
Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm in high dimensions.
Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings.
arXiv Detail & Related papers (2023-03-13T21:23:12Z) - Data-dependent Generalization Bounds via Variable-Size Compressibility [16.2444595840653]
We establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework.
In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data.
Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds.
arXiv Detail & Related papers (2023-03-09T16:17:45Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - An Indirect Rate-Distortion Characterization for Semantic Sources:
General Model and the Case of Gaussian Observation [83.93224401261068]
Source model is motivated by the recent surge of interest in the semantic aspect of information.
intrinsic state corresponds to the semantic feature of the source, which in general is not observable.
Rate-distortion function is the semantic rate-distortion function of the source.
arXiv Detail & Related papers (2022-01-29T02:14:24Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z) - Exponentially Weighted l_2 Regularization Strategy in Constructing
Reinforced Second-order Fuzzy Rule-based Model [72.57056258027336]
In the conventional Takagi-Sugeno-Kang (TSK)-type fuzzy models, constant or linear functions are usually utilized as the consequent parts of the fuzzy rules.
We introduce an exponential weight approach inspired by the weight function theory encountered in harmonic analysis.
arXiv Detail & Related papers (2020-07-02T15:42:15Z) - Optimal Bounds between $f$-Divergences and Integral Probability Metrics [8.401473551081748]
Families of $f$-divergences and Integral Probability Metrics are widely used to quantify similarity between probability distributions.
We systematically study the relationship between these two families from the perspective of convex duality.
We obtain new bounds while also recovering in a unified manner well-known results, such as Hoeffding's lemma.
arXiv Detail & Related papers (2020-06-10T17:39:11Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z) - Robust Generalization via $\alpha$-Mutual Information [24.40306100502023]
bounds connecting two probability measures of the same event using R'enyi $alpha$-Divergences and Sibson's $alpha$-Mutual Information.
Results have broad applications, from bounding the generalization error of learning algorithms to the more general framework of adaptive data analysis.
arXiv Detail & Related papers (2020-01-14T11:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.