Bayesian Semi-supervised Crowdsourcing
- URL: http://arxiv.org/abs/2012.11048v1
- Date: Sun, 20 Dec 2020 23:18:51 GMT
- Title: Bayesian Semi-supervised Crowdsourcing
- Authors: Panagiotis A. Traganitis and Georgios B. Giannakis
- Abstract summary: Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
- Score: 71.20185379303479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowdsourcing has emerged as a powerful paradigm for efficiently labeling
large datasets and performing various learning tasks, by leveraging crowds of
human annotators. When additional information is available about the data,
semi-supervised crowdsourcing approaches that enhance the aggregation of labels
from human annotators are well motivated. This work deals with semi-supervised
crowdsourced classification, under two regimes of semi-supervision: a) label
constraints, that provide ground-truth labels for a subset of data; and b)
potentially easier to obtain instance-level constraints, that indicate
relationships between pairs of data. Bayesian algorithms based on variational
inference are developed for each regime, and their quantifiably improved
performance, compared to unsupervised crowdsourcing, is analytically and
empirically validated on several crowdsourcing datasets.
Related papers
- A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment [76.04306818209753]
We introduce a substantial crowdsourcing annotation dataset collected from a real-world crowdsourcing platform.
This dataset comprises approximately two thousand workers, one million tasks, and six million annotations.
We evaluate the effectiveness of several representative truth inference algorithms on this dataset.
arXiv Detail & Related papers (2024-03-10T16:00:41Z) - Semi-supervised Semantic Segmentation via Boosting Uncertainty on
Unlabeled Data [6.318105712690353]
We provide an analysis on the labeled and unlabeled distributions in training datasets.
We propose two strategies and design an uncertainty booster algorithm, specially for semi-supervised semantic segmentation.
Our approach achieves state-of-the-art performance in our experiments compared to the current semi-supervised semantic segmentation methods.
arXiv Detail & Related papers (2023-11-30T18:01:03Z) - Weakly Supervised Video Anomaly Detection Based on Cross-Batch
Clustering Guidance [39.43891080713327]
Weakly supervised video anomaly detection (WSVAD) is a challenging task since only video-level labels are available for training.
We propose a novel WSVAD method based on cross-batch clustering guidance.
arXiv Detail & Related papers (2022-12-16T14:38:30Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - On Releasing Annotator-Level Labels and Information in Datasets [6.546195629698355]
We show that label aggregation may introduce representational biases of individual and group perspectives.
We propose recommendations for increased utility and transparency of datasets for downstream use cases.
arXiv Detail & Related papers (2021-10-12T02:35:45Z) - Deep Clustering based Fair Outlier Detection [19.601280507914325]
We propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection.
Our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.
arXiv Detail & Related papers (2021-06-09T15:12:26Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.