Representation Learning from Limited Educational Data with Crowdsourced
Labels
- URL: http://arxiv.org/abs/2009.11222v1
- Date: Wed, 23 Sep 2020 15:34:40 GMT
- Title: Representation Learning from Limited Educational Data with Crowdsourced
Labels
- Authors: Wentao Wang, Guowei Xu, Wenbiao Ding, Gale Yan Huang, Guoliang Li,
Jiliang Tang and Zitao Liu
- Abstract summary: We propose a novel framework which aims to learn effective representations from limited data with crowdsourced labels.
Specifically, we design a grouping based deep neural network to learn embeddings from a limited number of training samples.
We develop a hard example selection procedure to adaptively pick up training examples that are misclassified by the model.
- Score: 45.44620098891902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representation learning has been proven to play an important role in the
unprecedented success of machine learning models in numerous tasks, such as
machine translation, face recognition and recommendation. The majority of
existing representation learning approaches often require a large number of
consistent and noise-free labels. However, due to various reasons such as
budget constraints and privacy concerns, labels are very limited in many
real-world scenarios. Directly applying standard representation learning
approaches on small labeled data sets will easily run into over-fitting
problems and lead to sub-optimal solutions. Even worse, in some domains such as
education, the limited labels are usually annotated by multiple workers with
diverse expertise, which yields noises and inconsistency in such crowdsourcing
settings. In this paper, we propose a novel framework which aims to learn
effective representations from limited data with crowdsourced labels.
Specifically, we design a grouping based deep neural network to learn
embeddings from a limited number of training samples and present a Bayesian
confidence estimator to capture the inconsistency among crowdsourced labels.
Furthermore, to expedite the training process, we develop a hard example
selection procedure to adaptively pick up training examples that are
misclassified by the model. Extensive experiments conducted on three real-world
data sets demonstrate the superiority of our framework on learning
representations from limited data with crowdsourced labels, comparing with
various state-of-the-art baselines. In addition, we provide a comprehensive
analysis on each of the main components of our proposed framework and also
introduce the promising results it achieved in our real production to fully
understand the proposed framework.
Related papers
- Exploiting Minority Pseudo-Labels for Semi-Supervised Semantic Segmentation in Autonomous Driving [2.638145329894673]
We propose a professional training module to enhance minority class learning and a general training module to learn more comprehensive semantic information.
In experiments, our framework demonstrates superior performance compared to state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2024-09-19T11:47:25Z) - LC-Protonets: Multi-label Few-shot learning for world music audio tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.
LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.
Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - Text-Guided Mixup Towards Long-Tailed Image Categorization [7.207351201912651]
In many real-world applications, the frequency distribution of class labels for training data can exhibit a long-tailed distribution.
We propose a novel text-guided mixup technique that takes advantage of the semantic relations between classes recognized by the pre-trained text encoder.
arXiv Detail & Related papers (2024-09-05T14:37:43Z) - Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance.
In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem.
We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z) - A Multi-label Continual Learning Framework to Scale Deep Learning
Approaches for Packaging Equipment Monitoring [57.5099555438223]
We study multi-label classification in the continual scenario for the first time.
We propose an efficient approach that has a logarithmic complexity with regard to the number of tasks.
We validate our approach on a real-world multi-label Forecasting problem from the packaging industry.
arXiv Detail & Related papers (2022-08-08T15:58:39Z) - Self-training with Few-shot Rationalization: Teacher Explanations Aid
Student in Few-shot NLU [88.8401599172922]
We develop a framework based on self-training language models with limited task-specific labels and rationales.
We show that the neural model performance can be significantly improved by making it aware of its rationalized predictions.
arXiv Detail & Related papers (2021-09-17T00:36:46Z) - Sense and Learn: Self-Supervision for Omnipresent Sensors [9.442811508809994]
We present a framework named Sense and Learn for representation or feature learning from raw sensory data.
It consists of several auxiliary tasks that can learn high-level and broadly useful features entirely from unannotated data without any human involvement in the tedious labeling process.
Our methodology achieves results that are competitive with the supervised approaches and close the gap through fine-tuning a network while learning the downstream tasks in most cases.
arXiv Detail & Related papers (2020-09-28T11:57:43Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - NeuCrowd: Neural Sampling Network for Representation Learning with
Crowdsourced Labels [19.345894148534335]
We propose emphNeuCrowd, a unified framework for supervised representation learning (SRL) from crowdsourced labels.
The proposed framework is evaluated on both one synthetic and three real-world data sets.
arXiv Detail & Related papers (2020-03-21T13:38:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.