Improve Learning from Crowds via Generative Augmentation
- URL: http://arxiv.org/abs/2107.10449v1
- Date: Thu, 22 Jul 2021 04:14:30 GMT
- Title: Improve Learning from Crowds via Generative Augmentation
- Authors: Zhendong Chu, Hongning Wang
- Abstract summary: Crowdsourcing provides an efficient label collection schema for supervised machine learning.
To control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators.
This creates a sparsity issue and limits the quality of machine learning models trained on such data.
- Score: 36.38523364192051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Crowdsourcing provides an efficient label collection schema for supervised
machine learning. However, to control annotation cost, each instance in the
crowdsourced data is typically annotated by a small number of annotators. This
creates a sparsity issue and limits the quality of machine learning models
trained on such data. In this paper, we study how to handle sparsity in
crowdsourced data using data augmentation. Specifically, we propose to directly
learn a classifier by augmenting the raw sparse annotations. We implement two
principles of high-quality augmentation using Generative Adversarial Networks:
1) the generated annotations should follow the distribution of authentic ones,
which is measured by a discriminator; 2) the generated annotations should have
high mutual information with the ground-truth labels, which is measured by an
auxiliary network. Extensive experiments and comparisons against an array of
state-of-the-art learning from crowds methods on three real-world datasets
proved the effectiveness of our data augmentation framework. It shows the
potential of our algorithm for low-budget crowdsourcing in general.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Coupled Confusion Correction: Learning from Crowds with Sparse
Annotations [43.94012824749425]
Confusion matrices learned by two models can be corrected by the distilled data from the other.
We cluster the annotator groups'' who share similar expertise so that their confusion matrices could be corrected together.
arXiv Detail & Related papers (2023-12-12T14:47:26Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - GraphLearner: Graph Node Clustering with Fully Learnable Augmentation [76.63963385662426]
Contrastive deep graph clustering (CDGC) leverages the power of contrastive learning to group nodes into different clusters.
We propose a Graph Node Clustering with Fully Learnable Augmentation, termed GraphLearner.
It introduces learnable augmentors to generate high-quality and task-specific augmented samples for CDGC.
arXiv Detail & Related papers (2022-12-07T10:19:39Z) - Learning to Generate Novel Classes for Deep Metric Learning [24.048915378172012]
We introduce a new data augmentation approach that synthesizes novel classes and their embedding vectors.
We implement this idea by learning and exploiting a conditional generative model, which, given a class label and a noise, produces a random embedding vector of the class.
Our proposed generator allows the loss to use richer class relations by augmenting realistic and diverse classes, resulting in better generalization to unseen samples.
arXiv Detail & Related papers (2022-01-04T06:55:19Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Clustering augmented Self-Supervised Learning: Anapplication to Land
Cover Mapping [10.720852987343896]
We introduce a new method for land cover mapping by using a clustering based pretext task for self-supervised learning.
We demonstrate the effectiveness of the method on two societally relevant applications.
arXiv Detail & Related papers (2021-08-16T19:35:43Z) - Federated Self-Supervised Learning of Multi-Sensor Representations for
Embedded Intelligence [8.110949636804772]
Smartphones, wearables, and Internet of Things (IoT) devices produce a wealth of data that cannot be accumulated in a centralized repository for learning supervised models.
We propose a self-supervised approach termed textitscalogram-signal correspondence learning based on wavelet transform to learn useful representations from unlabeled sensor inputs.
We extensively assess the quality of learned features with our multi-view strategy on diverse public datasets, achieving strong performance in all domains.
arXiv Detail & Related papers (2020-07-25T21:59:17Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.