Error Exponent in Agnostic PAC Learning
- URL: http://arxiv.org/abs/2405.00792v1
- Date: Wed, 1 May 2024 18:08:03 GMT
- Title: Error Exponent in Agnostic PAC Learning
- Authors: Adi Hendel, Meir Feder,
- Abstract summary: Probably Approximately Correct (PAC) is widely used to analyze learning problems and algorithms.
In this paper, we consider PAC learning using the error exponent - a well established analysis method in Information Theory.
We find, under some assumptions, an improved distribution dependent error exponent for a wide range of problems, establishing the exponential behavior of the PAC error probability in learning.
- Score: 4.772817128620037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Statistical learning theory and the Probably Approximately Correct (PAC) criterion are the common approach to mathematical learning theory. PAC is widely used to analyze learning problems and algorithms, and have been studied thoroughly. Uniform worst case bounds on the convergence rate have been well established using, e.g., VC theory or Radamacher complexity. However, in a typical scenario the performance could be much better. In this paper, we consider PAC learning using a somewhat different tradeoff, the error exponent - a well established analysis method in Information Theory - which describes the exponential behavior of the probability that the risk will exceed a certain threshold as function of the sample size. We focus on binary classification and find, under some stability assumptions, an improved distribution dependent error exponent for a wide range of problems, establishing the exponential behavior of the PAC error probability in agnostic learning. Interestingly, under these assumptions, agnostic learning may have the same error exponent as realizable learning. The error exponent criterion can be applied to analyze knowledge distillation, a problem that so far lacks a theoretical analysis.
Related papers
- Is Transductive Learning Equivalent to PAC Learning? [0.9012198585960443]
We show that the PAC and transductive models are essentially equivalent for agnostic binary classification.
We leave as an intriguing open question whether our second result can be extended beyond binary classification to show the transductive and PAC models equivalent more broadly.
arXiv Detail & Related papers (2024-05-08T16:26:49Z) - On Learning Latent Models with Multi-Instance Weak Supervision [57.18649648182171]
We consider a weakly supervised learning scenario where the supervision signal is generated by a transition function $sigma$ labels associated with multiple input instances.
Our problem is met in different fields, including latent structural learning and neuro-symbolic integration.
arXiv Detail & Related papers (2023-06-23T22:05:08Z) - Learnability, Sample Complexity, and Hypothesis Class Complexity for
Regression Models [10.66048003460524]
This work is inspired by the foundation of PAC and is motivated by the existing regression learning issues.
The proposed approach, denoted by epsilon-Confidence Approximately Correct (epsilon CoAC), utilizes Kullback Leibler divergence (relative entropy)
It enables the learner to compare hypothesis classes of different complexity orders and choose among them the optimum with the minimum epsilon.
arXiv Detail & Related papers (2023-03-28T15:59:12Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Excess risk analysis for epistemic uncertainty with application to
variational inference [110.4676591819618]
We present a novel EU analysis in the frequentist setting, where data is generated from an unknown distribution.
We show a relation between the generalization ability and the widely used EU measurements, such as the variance and entropy of the predictive distribution.
We propose new variational inference that directly controls the prediction and EU evaluation performances based on the PAC-Bayesian theory.
arXiv Detail & Related papers (2022-06-02T12:12:24Z) - PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy
and Negative log-loss [28.166066663983674]
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset.
For classification tasks, this loss function is often the negative log-loss that leads to the well-known cross-entropy risk.
We introduce an analysis based on point-wise PAC approach over the generalization gap considering the mismatch of testing based on the accuracy metric and training on the negative log-loss.
arXiv Detail & Related papers (2021-12-10T14:00:22Z) - Characterizing the Generalization Error of Gibbs Algorithm with
Symmetrized KL information [18.92529916180208]
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory.
Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm.
arXiv Detail & Related papers (2021-07-28T22:20:34Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Probably Approximately Correct Constrained Learning [135.48447120228658]
We develop a generalization theory based on the probably approximately correct (PAC) learning framework.
We show that imposing a learner does not make a learning problem harder in the sense that any PAC learnable class is also a constrained learner.
We analyze the properties of this solution and use it to illustrate how constrained learning can address problems in fair and robust classification.
arXiv Detail & Related papers (2020-06-09T19:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.