Combining Self-labeling with Selective Sampling
- URL: http://arxiv.org/abs/2301.04420v1
- Date: Wed, 11 Jan 2023 11:58:45 GMT
- Title: Combining Self-labeling with Selective Sampling
- Authors: J\k{e}drzej Kozal, Micha{\l} Wo\'zniak
- Abstract summary: This work combines self-labeling techniques with active learning in a selective sampling scenario.
We show that naive application of self-labeling can harm performance by introducing bias towards selected classes.
The proposed method matches current selective sampling methods or achieves better results.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since data is the fuel that drives machine learning models, and access to
labeled data is generally expensive, semi-supervised methods are constantly
popular. They enable the acquisition of large datasets without the need for too
many expert labels. This work combines self-labeling techniques with active
learning in a selective sampling scenario. We propose a new method that builds
an ensemble classifier. Based on an evaluation of the inconsistency of the
decisions of the individual base classifiers for a given observation, a
decision is made on whether to request a new label or use the self-labeling. In
preliminary studies, we show that naive application of self-labeling can harm
performance by introducing bias towards selected classes and consequently lead
to skewed class distribution. Hence, we also propose mechanisms to reduce this
phenomenon. Experimental evaluation shows that the proposed method matches
current selective sampling methods or achieves better results.
Related papers
- Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples [9.360376286221943]
We introduce an adaptive batch selection algorithm tailored to multi-label deep learning models.
Our method converges faster and performs better than random batch selection.
arXiv Detail & Related papers (2024-03-27T02:00:18Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data [11.217084610985674]
We address the issue of binary classification from positive and unlabeled data (PU classification) with a selection bias in the positive data.
This scenario represents a conceptual framework for many practical applications, such as recommender systems.
We propose a method to identify the function of interest using a strong ignorability assumption and develop an Automatic Debiased PUE'' (ADPUE) learning method.
arXiv Detail & Related papers (2023-03-08T18:45:22Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Exploiting Diversity of Unlabeled Data for Label-Efficient
Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling.
We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting.
Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z) - LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly
Supervised Text Classification [28.37907856670151]
Pseudo-labels are noisy due to their nature, so selecting the correct ones has a huge potential for performance boost.
We propose a novel pseudo-label selection method LOPS that memorize takes learning order of samples into consideration.
LOPS can be viewed as a strong performance-boost plug-in to most of existing weakly-supervised text classification methods.
arXiv Detail & Related papers (2022-05-25T06:46:48Z) - Self-Training: A Survey [5.772546394254112]
Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations.
Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years.
We present self-training methods for binary and multi-class classification; as well as their variants and two related approaches.
arXiv Detail & Related papers (2022-02-24T11:40:44Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.