Efficient PAC Learning from the Crowd with Pairwise Comparison
- URL: http://arxiv.org/abs/2011.01104v3
- Date: Thu, 20 Jan 2022 04:01:42 GMT
- Title: Efficient PAC Learning from the Crowd with Pairwise Comparison
- Authors: Jie Shen, Shiwei Zeng
- Abstract summary: We study the problem of PAC learning threshold functions from the crowd, where the annotators can provide (noisy) labels or pairwise comparison tags.
We design a label-efficient algorithm that interleaves learning and annotation, which leads to a constant overhead of our algorithm.
- Score: 7.594050968868919
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Efficient PAC learning of threshold functions is arguably one of the most
important problems in machine learning. With the unprecedented growth of
large-scale data sets, it has become ubiquitous to appeal to the crowd wisdom
for data annotation, and the central problem that attracts a surge of recent
interests is how one can learn the underlying hypothesis from the highly noisy
crowd annotation while well-controlling the annotation cost. On the other hand,
a large body of recent works have investigated the problem of learning with not
only labels, but also pairwise comparisons, since in many applications it is
easier to compare than to label. In this paper, we study the problem of PAC
learning threshold functions from the crowd, where the annotators can provide
(noisy) labels or pairwise comparison tags. We design a label-efficient
algorithm that interleaves learning and annotation, which leads to a constant
overhead of our algorithm (a notion that characterizes the query complexity).
In contrast, a natural approach of annotation followed by learning leads to an
overhead growing with the sample size.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - DIRECT: Deep Active Learning under Imbalance and Label Noise [15.571923343398657]
We conduct the first study of active learning under both class imbalance and label noise.
We propose a novel algorithm that robustly identifies the class separation threshold and annotates the most uncertain examples.
Our results demonstrate that DIRECT can save more than 60% of the annotation budget compared to state-of-art active learning algorithms.
arXiv Detail & Related papers (2023-12-14T18:18:34Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - Improved Robust Algorithms for Learning with Discriminative Feature
Feedback [21.58493386054356]
Discriminative Feature Feedback is a protocol for interactive learning based on feature explanations that are provided by a human teacher.
We provide new robust interactive learning algorithms for the Discriminative Feature Feedback model.
arXiv Detail & Related papers (2022-09-08T12:11:12Z) - What Makes Good Contrastive Learning on Small-Scale Wearable-based
Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task.
This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Bounded Memory Active Learning through Enriched Queries [28.116967200489192]
Active learning is a paradigm in which data-hungry learning algorithms adaptively select informative examples in order to lower expensive labeling costs.
To combat this, a series of recent works have considered a model in which the learner may ask enriched queries beyond labels.
While such models have seen success in drastically lowering label costs, they tend to come at the expense of requiring large amounts of memory.
arXiv Detail & Related papers (2021-02-09T19:00:00Z) - Noise-tolerant, Reliable Active Classification with Comparison Queries [25.725730509014355]
We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label.
We provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise.
arXiv Detail & Related papers (2020-01-15T19:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.