Learning Algorithm Generalization Error Bounds via Auxiliary Distributions
- URL: http://arxiv.org/abs/2210.00483v2
- Date: Tue, 16 Apr 2024 22:52:21 GMT
- Title: Learning Algorithm Generalization Error Bounds via Auxiliary Distributions
- Authors: Gholamali Aminian, Saeed Masiha, Laura Toni, Miguel R. D. Rodrigues,
- Abstract summary: Generalization error bounds are essential for comprehending how well machine learning models work.
In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors.
- Score: 16.44492672878356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization error bounds are essential for comprehending how well machine learning models work. In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors that are appropriate for supervised learning scenarios. We show that our general upper bounds can be specialized under some conditions to new bounds involving the $\alpha$-Jensen-Shannon, $\alpha$-R\'enyi ($0< \alpha < 1$) information between a random variable modeling the set of training samples and another random variable modeling the set of hypotheses. Our upper bounds based on $\alpha$-Jensen-Shannon information are also finite. Additionally, we demonstrate how our auxiliary distribution method can be used to derive the upper bounds on excess risk of some learning algorithms in the supervised learning context {\blue and the generalization error under the distribution mismatch scenario in supervised learning algorithms, where the distribution mismatch is modeled as $\alpha$-Jensen-Shannon or $\alpha$-R\'enyi divergence between the distribution of test and training data samples distributions.} We also outline the conditions for which our proposed upper bounds might be tighter than other earlier upper bounds.
Related papers
- Universal Batch Learning Under The Misspecification Setting [4.772817128620037]
We consider the problem of universal em batch learning in a misspecification setting with log-loss.
We derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret.
arXiv Detail & Related papers (2024-05-12T11:16:05Z) - Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$.
The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z) - Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$.
We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Jensen-Shannon Information Based Characterization of the Generalization
Error of Learning Algorithms [17.682750327776777]
Generalization error bounds are critical to understanding the performance of machine learning models.
We propose a new information-theoretic based generalization error upper bound applicable to supervised learning scenarios.
arXiv Detail & Related papers (2020-10-23T20:53:07Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.