Learning from Crowds with Sparse and Imbalanced Annotations
- URL: http://arxiv.org/abs/2107.05039v1
- Date: Sun, 11 Jul 2021 13:06:20 GMT
- Title: Learning from Crowds with Sparse and Imbalanced Annotations
- Authors: Ye Shi, Shao-Yuan Li, Sheng-Jun Huang
- Abstract summary: crowdsourcing has established itself as an efficient labeling solution through resorting to non-expert crowds.
One common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the it sparse annotation phenomenon.
We propose one self-training based approach named it Self-Crowd by progressively adding confident pseudo-annotations and rebalancing the annotation distribution.
- Score: 29.596070201105274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional supervised learning requires ground truth labels for the training
data, whose collection can be difficult in many cases. Recently, crowdsourcing
has established itself as an efficient labeling solution through resorting to
non-expert crowds. To reduce the labeling error effects, one common practice is
to distribute each instance to multiple workers, whereas each worker only
annotates a subset of data, resulting in the {\it sparse annotation}
phenomenon. In this paper, we note that when meeting with class-imbalance,
i.e., when the ground truth labels are {\it class-imbalanced}, the sparse
annotations are prone to be skewly distributed, which thus can severely bias
the learning algorithm. To combat this issue, we propose one self-training
based approach named {\it Self-Crowd} by progressively adding confident
pseudo-annotations and rebalancing the annotation distribution. Specifically,
we propose one distribution aware confidence measure to select confident
pseudo-annotations, which adopts the resampling strategy to oversample the
minority annotations and undersample the majority annotations. On one
real-world crowdsourcing image classification task, we show that the proposed
method yields more balanced annotations throughout training than the
distribution agnostic methods and substantially improves the learning
performance at different annotation sparsity levels.
Related papers
- Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning.
This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks [9.110872603799839]
Supervised classification heavily depends on datasets annotated by humans.
In subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters.
In this work, we propose textbfAnnotator Awares for Texts (AART) for subjective classification tasks.
arXiv Detail & Related papers (2023-11-16T10:18:32Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Multi-View Knowledge Distillation from Crowd Annotations for
Out-of-Domain Generalization [53.24606510691877]
We propose new methods for acquiring soft-labels from crowd-annotations by aggregating the distributions produced by existing methods.
We demonstrate that these aggregation methods lead to the most consistent performance across four NLP tasks on out-of-domain test sets.
arXiv Detail & Related papers (2022-12-19T12:40:18Z) - Regularizing Neural Network Training via Identity-wise Discriminative
Feature Suppression [20.89979858757123]
When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error.
This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation.
arXiv Detail & Related papers (2022-09-29T05:14:56Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - CrowdTeacher: Robust Co-teaching with Noisy Answers & Sample-specific
Perturbations for Tabular Data [8.276156981100364]
Co-teaching methods have shown promising improvements for computer vision problems with noisy labels.
Our model, CrowdTeacher, uses the idea that robustness in the input space model can improve the perturbation of the classifier for noisy labels.
We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets.
arXiv Detail & Related papers (2021-03-31T15:09:38Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.