NeuCrowd: Neural Sampling Network for Representation Learning with
Crowdsourced Labels
- URL: http://arxiv.org/abs/2003.09660v4
- Date: Thu, 16 Dec 2021 02:54:47 GMT
- Title: NeuCrowd: Neural Sampling Network for Representation Learning with
Crowdsourced Labels
- Authors: Yang Hao, Wenbiao Ding, Zitao Liu
- Abstract summary: We propose emphNeuCrowd, a unified framework for supervised representation learning (SRL) from crowdsourced labels.
The proposed framework is evaluated on both one synthetic and three real-world data sets.
- Score: 19.345894148534335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representation learning approaches require a massive amount of discriminative
training data, which is unavailable in many scenarios, such as healthcare,
smart city, education, etc. In practice, people refer to crowdsourcing to get
annotated labels. However, due to issues like data privacy, budget limitation,
shortage of domain-specific annotators, the number of crowdsourced labels is
still very limited. Moreover, because of annotators' diverse expertise,
crowdsourced labels are often inconsistent. Thus, directly applying existing
supervised representation learning (SRL) algorithms may easily get the
overfitting problem and yield suboptimal solutions. In this paper, we propose
\emph{NeuCrowd}, a unified framework for SRL from crowdsourced labels. The
proposed framework (1) creates a sufficient number of high-quality
\emph{n}-tuplet training samples by utilizing safety-aware sampling and robust
anchor generation; and (2) automatically learns a neural sampling network that
adaptively learns to select effective samples for SRL networks. The proposed
framework is evaluated on both one synthetic and three real-world data sets.
The results show that our approach outperforms a wide range of state-of-the-art
baselines in terms of prediction accuracy and AUC. To encourage reproducible
results, we make our code publicly available at
\url{https://github.com/tal-ai/NeuCrowd_KAIS2021}.
Related papers
- Text-Guided Mixup Towards Long-Tailed Image Categorization [7.207351201912651]
In many real-world applications, the frequency distribution of class labels for training data can exhibit a long-tailed distribution.
We propose a novel text-guided mixup technique that takes advantage of the semantic relations between classes recognized by the pre-trained text encoder.
arXiv Detail & Related papers (2024-09-05T14:37:43Z) - Towards Realistic Long-tailed Semi-supervised Learning in an Open World [0.0]
We construct a more emphRealistic Open-world Long-tailed Semi-supervised Learning (textbfROLSSL) setting where there is no premise on the distribution relationships between known and novel categories.
Under the proposed ROLSSL setting, we propose a simple yet potentially effective solution called dual-stage logit adjustments.
Experiments on datasets such as CIFAR100 and ImageNet100 have demonstrated performance improvements of up to 50.1%.
arXiv Detail & Related papers (2024-05-23T12:53:50Z) - SemiReward: A General Reward Model for Semi-supervised Learning [58.47299780978101]
Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling.
Main challenge is how to distinguish high-quality pseudo labels against the confirmation bias.
We propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels.
arXiv Detail & Related papers (2023-10-04T17:56:41Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - On Non-Random Missing Labels in Semi-Supervised Learning [114.62655062520425]
Semi-Supervised Learning (SSL) is fundamentally a missing label problem.
We explicitly incorporate "class" into SSL.
Our method not only significantly outperforms existing baselines but also surpasses other label bias removal SSL methods.
arXiv Detail & Related papers (2022-06-29T22:01:29Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Temporal-aware Language Representation Learning From Crowdsourced Labels [12.40460861125743]
We propose emphTACMA, a language representation learning algorithm for underlinecrowdsourced labels with underlineannotators.
The proposed is extremely easy to implement in around 5 lines of code.
The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC.
arXiv Detail & Related papers (2021-07-15T05:25:56Z) - Semi-supervised deep learning based on label propagation in a 2D
embedded space [117.9296191012968]
Proposed solutions propagate labels from a small set of supervised images to a large set of unsupervised ones to train a deep neural network model.
We present a loop in which a deep neural network (VGG-16) is trained from a set with more correctly labeled samples along iterations.
As the labeled set improves along iterations, it improves the features of the neural network.
arXiv Detail & Related papers (2020-08-02T20:08:54Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.