Identify ambiguous tasks combining crowdsourced labels by weighting
Areas Under the Margin
- URL: http://arxiv.org/abs/2209.15380v3
- Date: Thu, 30 Nov 2023 15:10:47 GMT
- Title: Identify ambiguous tasks combining crowdsourced labels by weighting
Areas Under the Margin
- Authors: Tanguy Lefort and Benjamin Charlier and Alexis Joly and Joseph Salmon
- Abstract summary: Ambiguous tasks might fool expert workers, which is often harmful for the learning step.
We adapt the Area Under the Margin (AUM) to identify mislabeled data in crowdsourced learning scenarios.
We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization performance.
- Score: 13.437403258942716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In supervised learning - for instance in image classification - modern
massive datasets are commonly labeled by a crowd of workers. The obtained
labels in this crowdsourcing setting are then aggregated for training,
generally leveraging a per-worker trust score. Yet, such workers oriented
approaches discard the tasks' ambiguity. Ambiguous tasks might fool expert
workers, which is often harmful for the learning step. In standard supervised
learning settings - with one label per task - the Area Under the Margin (AUM)
was tailored to identify mislabeled data. We adapt the AUM to identify
ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted
Areas Under the Margin (WAUM). The WAUM is an average of AUMs weighted
according to task-dependent scores. We show that the WAUM can help discarding
ambiguous tasks from the training set, leading to better generalization
performance. We report improvements over existing strategies for learning with
a crowd, both on simulated settings, and on real datasets such as CIFAR-10H (a
crowdsourced dataset with a high number of answered labels),LabelMe and Music
(two datasets with few answered votes).
Related papers
- Association Graph Learning for Multi-Task Classification with Category
Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously.
We learn an association graph to transfer knowledge among tasks for missing classes.
Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z) - Is margin all you need? An extensive empirical study of active learning
on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark.
Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z) - Using Self-Supervised Pretext Tasks for Active Learning [7.214674613451605]
We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative.
The pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and grouped into batches by their pretext task losses.
In each iteration, the main task model is used to sample the most uncertain data in a batch to be annotated.
arXiv Detail & Related papers (2022-01-19T07:58:06Z) - CaSP: Class-agnostic Semi-Supervised Pretraining for Detection and
Segmentation [60.28924281991539]
We propose a novel Class-agnostic Semi-supervised Pretraining (CaSP) framework to achieve a more favorable task-specificity balance.
Using 3.6M unlabeled data, we achieve a remarkable performance gain of 4.7% over ImageNet-pretrained baseline on object detection.
arXiv Detail & Related papers (2021-12-09T14:54:59Z) - Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z) - Boosting the Performance of Semi-Supervised Learning with Unsupervised
Clustering [10.033658645311188]
We show that ignoring labels altogether for whole epochs intermittently during training can significantly improve performance in the small sample regime.
We demonstrate our method's efficacy in boosting several state-of-the-art SSL algorithms.
arXiv Detail & Related papers (2020-12-01T14:19:14Z) - End-to-End Learning from Noisy Crowd to Supervised Machine Learning
Models [6.278267504352446]
We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data.
We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process.
We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
arXiv Detail & Related papers (2020-11-13T09:48:30Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Deep Categorization with Semi-Supervised Self-Organizing Maps [0.0]
This article presents a semi-supervised model, called Batch Semi-Supervised Self-Organizing Map (Batch SS-SOM)
The results show that Batch SS-SOM is a good option for semi-supervised classification and clustering.
It performs well in terms of accuracy and clustering error, even with a small number of labeled samples.
arXiv Detail & Related papers (2020-06-17T22:00:04Z) - Task-Aware Variational Adversarial Active Learning [42.334671410592065]
We propose task-aware variational adversarial AL (TA-VAAL) that modifies task-agnostic VAAL.
Our proposed TA-VAAL outperforms state-of-the-arts on various benchmark datasets for classifications with balanced / imbalanced labels.
arXiv Detail & Related papers (2020-02-11T22:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.