A Light-weight, Effective and Efficient Model for Label Aggregation in
Crowdsourcing
- URL: http://arxiv.org/abs/2212.00007v1
- Date: Sat, 19 Nov 2022 11:13:03 GMT
- Title: A Light-weight, Effective and Efficient Model for Label Aggregation in
Crowdsourcing
- Authors: Yi Yang, Zhong-Qiu Zhao, Quan Bai, Qing Liu, Weihua Li
- Abstract summary: Label aggregation (LA) has emerged as a standard procedure to post-process crowdsourced labels.
In this paper, we treat LA as a dynamic system and model it as a Dynamic Bayesian network.
We derive two light-weight algorithms, LAtextsuperscriptonepass and LAtextsuperscripttwopass, which can effectively and efficiently estimate worker qualities and true labels.
- Score: 26.699587663952975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the noises in crowdsourced labels, label aggregation (LA) has emerged
as a standard procedure to post-process crowdsourced labels. LA methods
estimate true labels from crowdsourced labels by modeling worker qualities.
Most existing LA methods are iterative in nature. They need to traverse all the
crowdsourced labels multiple times in order to jointly and iteratively update
true labels and worker qualities until convergence. Consequently, these methods
have high space and time complexities. In this paper, we treat LA as a dynamic
system and model it as a Dynamic Bayesian network. From the dynamic model we
derive two light-weight algorithms, LA\textsuperscript{onepass} and
LA\textsuperscript{twopass}, which can effectively and efficiently estimate
worker qualities and true labels by traversing all the labels at most twice.
Due to the dynamic nature, the proposed algorithms can also estimate true
labels online without re-visiting historical data. We theoretically prove the
convergence property of the proposed algorithms, and bound the error of
estimated worker qualities. We also analyze the space and time complexities of
the proposed algorithms and show that they are equivalent to those of majority
voting. Experiments conducted on 20 real-world datasets demonstrate that the
proposed algorithms can effectively and efficiently aggregate labels in both
offline and online settings even if they traverse all the labels at most twice.
Related papers
- Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement [8.804897656598051]
We propose SiDyP: Simplex Label Diffusion with Dynamic Prior to calibrate the classifier's prediction.<n>Our framework can increase the performance of the BERT classifier fine-tuned on both zero-shot and few-shot LLM-generated noisy label datasets by an average of 7.21% and 7.30% respectively.
arXiv Detail & Related papers (2025-05-26T08:31:55Z) - Inaccurate Label Distribution Learning with Dependency Noise [52.08553913094809]
We introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning.
We show that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.
arXiv Detail & Related papers (2024-05-26T07:58:07Z) - SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow [35.76243023101549]
SPAM is a video label engine that provides high-quality labels with minimal human intervention.
We use a unified graph formulation to address the annotation of both detections and identity association for tracks across time.
We demonstrate that trackers trained on SPAM labels achieve comparable performance to those trained on human annotations.
arXiv Detail & Related papers (2024-04-17T14:33:41Z) - Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning [8.387189407144403]
Partial label learning (PLL) is a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels (partial label)
NPLL relaxes this constraint by allowing some partial labels to not contain the true label, enhancing the practicality of the problem.
We present a minimalistic framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm.
arXiv Detail & Related papers (2024-02-07T13:32:47Z) - Learning with Noisy Labels: Interconnection of Two
Expectation-Maximizations [41.65589788264123]
Labor-intensive labeling becomes a bottleneck in developing computer vision algorithms based on deep learning.
We address learning with noisy labels (LNL) problem, which is formalized as a task of finding a structured manifold in the midst of noisy data.
Our algorithm achieves state-of-the-art performance in multiple standard benchmarks with substantial margins under various types of label noise.
arXiv Detail & Related papers (2024-01-09T07:22:30Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Complementary to Multiple Labels: A Correlation-Aware Correction
Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases.
We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z) - Improving Model Training via Self-learned Label Representations [5.969349640156469]
We show that more sophisticated label representations are better for classification than the usual one-hot encoding.
We propose Learning with Adaptive Labels (LwAL) algorithm, which simultaneously learns the label representation while training for the classification task.
Our algorithm introduces negligible additional parameters and has a minimal computational overhead.
arXiv Detail & Related papers (2022-09-09T21:10:43Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Instance-Dependent Partial Label Learning [69.49681837908511]
Partial label learning is a typical weakly supervised learning problem.
Most existing approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels.
In this paper, we consider instance-dependent and assume that each example is associated with a latent label distribution constituted by the real number of each label.
arXiv Detail & Related papers (2021-10-25T12:50:26Z) - Instance-dependent Label-noise Learning under a Structural Causal Model [92.76400590283448]
Label noise will degenerate the performance of deep learning algorithms.
By leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning.
arXiv Detail & Related papers (2021-09-07T10:42:54Z) - Group-aware Label Transfer for Domain Adaptive Person Re-identification [179.816105255584]
Unsupervised Adaptive Domain (UDA) person re-identification (ReID) aims at adapting the model trained on a labeled source-domain dataset to a target-domain dataset without any further annotations.
Most successful UDA-ReID approaches combine clustering-based pseudo-label prediction with representation learning and perform the two steps in an alternating fashion.
We propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.
arXiv Detail & Related papers (2021-03-23T07:57:39Z) - Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data.
It is often the case that most of our data is unlabeled.
Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.