Neighborhood-based Pooling for Population-level Label Distribution
Learning
- URL: http://arxiv.org/abs/2003.07406v2
- Date: Wed, 29 Apr 2020 23:13:00 GMT
- Title: Neighborhood-based Pooling for Population-level Label Distribution
Learning
- Authors: Tharindu Cyril Weerasooriya, Tong Liu, Christopher M. Homan
- Abstract summary: Supervised machine learning often requires human-annotated data.
Population-level label distribution learning treats the collection of annotations for each data item as a sample of the opinions of a population of human annotators.
We propose an algorithmic framework and new statistical tests for PLDL that account for sampling size.
- Score: 5.790608871289107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised machine learning often requires human-annotated data. While
annotator disagreement is typically interpreted as evidence of noise,
population-level label distribution learning (PLDL) treats the collection of
annotations for each data item as a sample of the opinions of a population of
human annotators, among whom disagreement may be proper and expected, even with
no noise present. From this perspective, a typical training set may contain a
large number of very small-sized samples, one for each data item, none of
which, by itself, is large enough to be considered representative of the
underlying population's beliefs about that item. We propose an algorithmic
framework and new statistical tests for PLDL that account for sampling size. We
apply them to previously proposed methods for sharing labels across similar
data items. We also propose new approaches for label sharing, which we call
neighborhood-based pooling.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Subjective Crowd Disagreements for Subjective Data: Uncovering
Meaningful CrowdOpinion with Population-level Learning [8.530934084017966]
We introduce emphCrowdOpinion, an unsupervised learning approach that uses language features and label distributions to pool similar items into larger samples of label distributions.
We use five publicly available benchmark datasets (with varying levels of annotator disagreements) from social media.
We also experiment in the wild using a dataset from Facebook, where annotations come from the platform itself by users reacting to posts.
arXiv Detail & Related papers (2023-07-07T22:09:46Z) - Utilizing supervised models to infer consensus labels and their quality
from data with multiple annotators [16.79939549201032]
Real-world data for classification is often labeled by multiple annotators.
We introduce CROWDLAB, a straightforward approach to estimate such data.
Our proposed method provides superior estimates for (1)- (3) than many alternative algorithms.
arXiv Detail & Related papers (2022-10-13T07:54:07Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - Instance-Dependent Partial Label Learning [69.49681837908511]
Partial label learning is a typical weakly supervised learning problem.
Most existing approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels.
In this paper, we consider instance-dependent and assume that each example is associated with a latent label distribution constituted by the real number of each label.
arXiv Detail & Related papers (2021-10-25T12:50:26Z) - On Releasing Annotator-Level Labels and Information in Datasets [6.546195629698355]
We show that label aggregation may introduce representational biases of individual and group perspectives.
We propose recommendations for increased utility and transparency of datasets for downstream use cases.
arXiv Detail & Related papers (2021-10-12T02:35:45Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z) - Active Crowd Counting with Limited Supervision [13.09054893296829]
We present an active learning framework which enables accurate crowd counting with limited supervision.
We first introduce an active labeling strategy to annotate the most informative images in the dataset and learn the counting model upon them.
In the last cycle when the labeling budget is met, the large amount of unlabeled data are also utilized.
arXiv Detail & Related papers (2020-07-13T12:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.