Do PAC-Learners Learn the Marginal Distribution?
- URL: http://arxiv.org/abs/2302.06285v1
- Date: Mon, 13 Feb 2023 11:42:58 GMT
- Title: Do PAC-Learners Learn the Marginal Distribution?
- Authors: Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan
- Abstract summary: We study a variant of PAC-Learning in which the adversary is restricted to a known family of marginal distributions $mathscrP$.
We show that TV-learning is emphequivalent to PAC-Learning.
- Score: 24.80812483480747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a foundational variant of Valiant and Vapnik and Chervonenkis'
Probably Approximately Correct (PAC)-Learning in which the adversary is
restricted to a known family of marginal distributions $\mathscr{P}$. In
particular, we study how the PAC-learnability of a triple $(\mathscr{P},X,H)$
relates to the learners ability to infer \emph{distributional} information
about the adversary's choice of $D \in \mathscr{P}$. To this end, we introduce
the `unsupervised' notion of \emph{TV-Learning}, which, given a class
$(\mathscr{P},X,H)$, asks the learner to approximate $D$ from unlabeled samples
with respect to a natural class-conditional total variation metric.
In the classical distribution-free setting, we show that TV-learning is
\emph{equivalent} to PAC-Learning: in other words, any learner must infer
near-maximal information about $D$. On the other hand, we show this
characterization breaks down for general $\mathscr{P}$, where PAC-Learning is
strictly sandwiched between two approximate variants we call `Strong' and
`Weak' TV-learning, roughly corresponding to unsupervised learners that
estimate most relevant distances in $D$ with respect to $H$, but differ in
whether the learner \emph{knows} the set of well-estimated events. Finally, we
observe that TV-learning is in fact equivalent to the classical notion of
\emph{uniform estimation}, and thereby give a strong refutation of the uniform
convergence paradigm in supervised learning.
Related papers
- On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning [57.18649648182171]
We make contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL.
We derive class-specific risk bounds for MI-PLL, while making minimal assumptions.
Our theory reveals a unique phenomenon: that $sigma$ can greatly impact learning imbalances.
arXiv Detail & Related papers (2024-07-13T20:56:34Z) - Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension [17.485243410774814]
In traditional models of supervised learning, the goal of a learner is to output a hypothesis that is competitive (to within $epsilon$) of the best fitting concept from some class.
We introduce a smoothed-analysis framework that requires a learner to compete only with the best agnostic.
We obtain the first algorithm forally learning intersections of $k$-halfspaces in time.
arXiv Detail & Related papers (2024-07-01T04:58:36Z) - Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes.
Despite theoretically appealing, the application of dominance in machine learning has been scarce.
We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables.
We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z) - The Sample Complexity of Multi-Distribution Learning for VC Classes [25.73730126599202]
Multi-distribution learning is a generalization of PAC learning to settings with multiple data distributions.
There remains a significant gap between the known upper and lower bounds for PAC-learnable classes.
We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
arXiv Detail & Related papers (2023-07-22T18:02:53Z) - Differentially-Private Bayes Consistency [70.92545332158217]
We construct a Bayes consistent learning rule that satisfies differential privacy (DP)
We prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal sample complexity.
arXiv Detail & Related papers (2022-12-08T11:57:30Z) - Learning versus Refutation in Noninteractive Local Differential Privacy [133.80204506727526]
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation.
Our main result is a complete characterization of the sample complexity of PAC learning for non-interactive LDP protocols.
arXiv Detail & Related papers (2022-10-26T03:19:24Z) - When are Local Queries Useful for Robust Learning? [25.832511407411637]
We study learning models where the learner is given more power through the use of local queries.
We give the first distribution-free algorithms that perform robust empirical risk minimization.
We finish by giving robust learning algorithms for halfspaces on $0,1n$ and then obtaining robustness guarantees for halfspaces in $mathbbRn$ against precision-bounded adversaries.
arXiv Detail & Related papers (2022-10-12T11:04:22Z) - On Optimal Learning Under Targeted Data Poisoning [48.907813854832206]
In this work we aim to characterize the smallest achievable error $epsilon=epsilon(eta)$ by the learner in the presence of such an adversary.
Remarkably, we show that the upper bound can be attained by a deterministic learner.
arXiv Detail & Related papers (2022-10-06T06:49:48Z) - Quantization for decentralized learning under subspace constraints [61.59416703323886]
We consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints.
We propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates.
The analysis shows that, under some general conditions on the quantization noise, the strategy is stable both in terms of mean-square error and average bit rate.
arXiv Detail & Related papers (2022-09-16T09:38:38Z) - Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Measuring Model Fairness under Noisy Covariates: A Theoretical
Perspective [26.704446184314506]
We study the problem of measuring the fairness of a machine learning model under noisy information.
We present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible.
arXiv Detail & Related papers (2021-05-20T18:36:28Z) - Provable Robustness of Adversarial Training for Learning Halfspaces with
Noise [95.84614821570283]
We analyze the properties of adversarial learning adversarially robust halfspaces in the presence of label noise.
To the best of our knowledge, this is the first work to show that adversarial training prov yields classifiers in noise.
arXiv Detail & Related papers (2021-04-19T16:35:38Z) - On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and
Fourier-based Algorithms [10.66048003460524]
We develop a framework using Hilbert spaces as a proxy to analyze PAC learning problems with structural properties.
We demonstrate that PAC learning with 0-1 loss is equivalent to an optimization in the Hilbert space domain.
arXiv Detail & Related papers (2021-02-11T21:28:55Z) - Arbitrary Conditional Distributions with Energy [11.081460215563633]
A more general and useful problem is arbitrary conditional density estimation.
We propose a novel method, Arbitrary Conditioning with Energy (ACE), that can simultaneously estimate the distribution $p(mathbfx_u mid mathbfx_o)$.
We also simplify the learning problem by only learning one-dimensional conditionals, from which more complex distributions can be recovered during inference.
arXiv Detail & Related papers (2021-02-08T18:36:26Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Reducing Adversarially Robust Learning to Non-Robust PAC Learning [39.51923275855131]
We give a reduction that can robustly learn any hypothesis class using any non-robust learner.
The number of calls to $mathcalA$ depends logarithmically on the number of allowed adversarial perturbations per example.
arXiv Detail & Related papers (2020-10-22T20:28:35Z) - Beyond $\mathcal{H}$-Divergence: Domain Adaptation Theory With
Jensen-Shannon Divergence [21.295136514836788]
We reveal the incoherence between the widely-adopted empirical domain adversarial training and its generally-assumed theoretical counterpart based on $mathcalH$-divergence.
We establish a new theoretical framework by directly proving the upper and lower target risk bounds based on joint distributional Jensen-Shannon divergence.
arXiv Detail & Related papers (2020-07-30T16:19:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.