Related papers: Do PAC-Learners Learn the Marginal Distribution?

Do PAC-Learners Learn the Marginal Distribution?

URL: http://arxiv.org/abs/2302.06285v3
Date: Tue, 18 Feb 2025 23:50:17 GMT
Title: Do PAC-Learners Learn the Marginal Distribution?
Authors: Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan,
Abstract summary: The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $textituniform convergence$ of empirical error in $H$.<n>This work revisits the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting.
Score: 19.54058590042626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $\textit{uniform convergence}$ of empirical error in $H$ to its mean, or equivalently, to the problem of $\textit{density estimation}$, learnability of the underlying marginal distribution with respect to events in $H$. This seminal equivalence relies strongly on PAC learning's `distribution-free' assumption, that the adversary may choose any marginal distribution over data. Unfortunately, the distribution-free model is known to be overly adversarial in practice, failing to predict the success of modern machine learning algorithms, but without the Fundamental Theorem our theoretical understanding of learning under distributional constraints remains highly limited. In this work, we revisit the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting when the adversary is restricted to choosing a marginal distribution from a known family $\mathscr{P}$. We prove that while the traditional Fundamental Theorem indeed fails, a finer-grained connection between the three fundamental notions continues to hold: 1. PAC-Learning is strictly sandwiched between two refined models of density estimation, differing only in whether the learner $\textit{knows}$ the set of well-estimated events in $H$. 2. Under reasonable assumptions on $H$ and $\mathscr{P}$, density estimation is equivalent to $\textit{uniform estimation}$, a relaxation of uniform convergence allowing non-empirical estimators. Together, our results give a clearer picture of how the Fundamental Theorem extends beyond the distribution-free setting and shed new light on the classically challenging problem of learning under arbitrary distributional assumptions.

Related papers

A Distributional-Lifting Theorem for PAC Learning [16.985620991607345]
Distributional assumptions facilitate the design of efficient algorithms but limit their reach and relevance.<n>Recent work of Blanc, Lange, Malik, and Tan considered the special case of lifting uniform-distribution learners.<n>We show that their approach is information-theoretically intractable with access only to random examples.<n>We then take a different approach that sidesteps the need to learn $Dstar$, yielding a lifter that works in the standard PAC model.
arXiv Detail & Related papers (2025-06-19T23:28:38Z)
On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning [57.18649648182171]
We make contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL. We derive class-specific risk bounds for MI-PLL, while making minimal assumptions. Our theory reveals a unique phenomenon: that $sigma$ can greatly impact learning imbalances.
arXiv Detail & Related papers (2024-07-13T20:56:34Z)
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension [17.485243410774814]
In traditional models of supervised learning, the goal of a learner is to output a hypothesis that is competitive (to within $epsilon$) of the best fitting concept from some class. We introduce a smoothed-analysis framework that requires a learner to compete only with the best agnostic. We obtain the first algorithm forally learning intersections of $k$-halfspaces in time.
arXiv Detail & Related papers (2024-07-01T04:58:36Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class. Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z)
Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes. Despite theoretically appealing, the application of dominance in machine learning has been scarce. We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z)
The Sample Complexity of Multi-Distribution Learning for VC Classes [25.73730126599202]
Multi-distribution learning is a generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
arXiv Detail & Related papers (2023-07-22T18:02:53Z)
Differentially-Private Bayes Consistency [70.92545332158217]
We construct a Bayes consistent learning rule that satisfies differential privacy (DP) We prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal sample complexity.
arXiv Detail & Related papers (2022-12-08T11:57:30Z)
Learning versus Refutation in Noninteractive Local Differential Privacy [133.80204506727526]
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation. Our main result is a complete characterization of the sample complexity of PAC learning for non-interactive LDP protocols.
arXiv Detail & Related papers (2022-10-26T03:19:24Z)
When are Local Queries Useful for Robust Learning? [25.832511407411637]
We study learning models where the learner is given more power through the use of local queries. We give the first distribution-free algorithms that perform robust empirical risk minimization. We finish by giving robust learning algorithms for halfspaces on $0,1n$ and then obtaining robustness guarantees for halfspaces in $mathbbRn$ against precision-bounded adversaries.
arXiv Detail & Related papers (2022-10-12T11:04:22Z)
On Optimal Learning Under Targeted Data Poisoning [48.907813854832206]
In this work we aim to characterize the smallest achievable error $epsilon=epsilon(eta)$ by the learner in the presence of such an adversary. Remarkably, we show that the upper bound can be attained by a deterministic learner.
arXiv Detail & Related papers (2022-10-06T06:49:48Z)
Quantization for decentralized learning under subspace constraints [61.59416703323886]
We consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints. We propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates. The analysis shows that, under some general conditions on the quantization noise, the strategy is stable both in terms of mean-square error and average bit rate.
arXiv Detail & Related papers (2022-09-16T09:38:38Z)
Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise. We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z)
Robust PAC$^m$: Training Ensemble Models Under Misspecification and Outliers [46.38465729190199]
PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a bound on the generalization error for Gibbs predictors. This work presents a novel robust free energy criterion that combines the generalized score function with PAC$m$ ensemble bounds.
arXiv Detail & Related papers (2022-03-03T17:11:07Z)
A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms. We develop a general framework from the perspective of Bregman minimization divergence. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed. We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z)
Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective [26.704446184314506]
We study the problem of measuring the fairness of a machine learning model under noisy information. We present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible.
arXiv Detail & Related papers (2021-05-20T18:36:28Z)
Provable Robustness of Adversarial Training for Learning Halfspaces with Noise [95.84614821570283]
We analyze the properties of adversarial learning adversarially robust halfspaces in the presence of label noise. To the best of our knowledge, this is the first work to show that adversarial training prov yields classifiers in noise.
arXiv Detail & Related papers (2021-04-19T16:35:38Z)
On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and Fourier-based Algorithms [10.66048003460524]
We develop a framework using Hilbert spaces as a proxy to analyze PAC learning problems with structural properties. We demonstrate that PAC learning with 0-1 loss is equivalent to an optimization in the Hilbert space domain.
arXiv Detail & Related papers (2021-02-11T21:28:55Z)
Arbitrary Conditional Distributions with Energy [11.081460215563633]
A more general and useful problem is arbitrary conditional density estimation. We propose a novel method, Arbitrary Conditioning with Energy (ACE), that can simultaneously estimate the distribution $p(mathbfx_u mid mathbfx_o)$. We also simplify the learning problem by only learning one-dimensional conditionals, from which more complex distributions can be recovered during inference.
arXiv Detail & Related papers (2021-02-08T18:36:26Z)
Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z)
Reducing Adversarially Robust Learning to Non-Robust PAC Learning [39.51923275855131]
We give a reduction that can robustly learn any hypothesis class using any non-robust learner. The number of calls to $mathcalA$ depends logarithmically on the number of allowed adversarial perturbations per example.
arXiv Detail & Related papers (2020-10-22T20:28:35Z)
Beyond $\mathcal{H}$-Divergence: Domain Adaptation Theory With Jensen-Shannon Divergence [21.295136514836788]
We reveal the incoherence between the widely-adopted empirical domain adversarial training and its generally-assumed theoretical counterpart based on $mathcalH$-divergence. We establish a new theoretical framework by directly proving the upper and lower target risk bounds based on joint distributional Jensen-Shannon divergence.
arXiv Detail & Related papers (2020-07-30T16:19:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.