Semi-verified PAC Learning from the Crowd
- URL: http://arxiv.org/abs/2106.07080v3
- Date: Thu, 18 May 2023 19:19:30 GMT
- Title: Semi-verified PAC Learning from the Crowd
- Authors: Shiwei Zeng and Jie Shen
- Abstract summary: We study the problem of crowdsourced PAC learning of threshold functions.
We show that under the semi-verified model of Charikar et al., it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries.
- Score: 7.594050968868919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of crowdsourced PAC learning of threshold functions.
This is a challenging problem and only recently have query-efficient algorithms
been established under the assumption that a noticeable fraction of the workers
are perfect. In this work, we investigate a more challenging case where the
majority may behave adversarially and the rest behave as the Massart noise - a
significant generalization of the perfectness assumption. We show that under
the {semi-verified model} of Charikar et al. (2017), where we have (limited)
access to a trusted oracle who always returns correct annotations, it is
possible to PAC learn the underlying hypothesis class with a manageable amount
of label queries. Moreover, we show that the labeling cost can be drastically
mitigated via the more easily obtained comparison queries. Orthogonal to recent
developments in semi-verified or list-decodable learning that crucially rely on
data distributional assumptions, our PAC guarantee holds by exploring the
wisdom of the crowd.
Related papers
- From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms.
We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation.
Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective [25.157282476221482]
We show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior.
We offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets.
arXiv Detail & Related papers (2022-05-15T15:58:27Z) - A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability [57.502573663108535]
We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model.
We show that there is a significant benefit in semi-supervised robust learning even in the worst-case distribution-free model.
arXiv Detail & Related papers (2022-02-11T03:01:45Z) - Learning Stochastic Majority Votes by Minimizing a PAC-Bayes
Generalization Bound [15.557653926558638]
We investigate a counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties.
We instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk.
The resulting majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight bounds.
arXiv Detail & Related papers (2021-06-23T16:57:23Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Efficient PAC Learning from the Crowd with Pairwise Comparison [7.594050968868919]
We study the problem of PAC learning threshold functions from the crowd, where the annotators can provide (noisy) labels or pairwise comparison tags.
We design a label-efficient algorithm that interleaves learning and annotation, which leads to a constant overhead of our algorithm.
arXiv Detail & Related papers (2020-11-02T16:37:55Z) - Probably Approximately Correct Constrained Learning [135.48447120228658]
We develop a generalization theory based on the probably approximately correct (PAC) learning framework.
We show that imposing a learner does not make a learning problem harder in the sense that any PAC learnable class is also a constrained learner.
We analyze the properties of this solution and use it to illustrate how constrained learning can address problems in fair and robust classification.
arXiv Detail & Related papers (2020-06-09T19:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.