Semi-Supervised Learning of Classifiers from a Statistical Perspective:
A Brief Review
- URL: http://arxiv.org/abs/2104.04046v1
- Date: Thu, 8 Apr 2021 20:41:57 GMT
- Title: Semi-Supervised Learning of Classifiers from a Statistical Perspective:
A Brief Review
- Authors: Daniel Ahfock, Geoffrey J. McLachlan
- Abstract summary: We provide here a review of statistical SSL approaches to forming a classifier.
We focus on the recent result that a classifier formed from a partially classified sample can actually have smaller expected error rate than that if the sample were completely classified.
- Score: 1.6752182911522517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been increasing attention to semi-supervised learning (SSL)
approaches in machine learning to forming a classifier in situations where the
training data for a classifier consists of a limited number of classified
observations but a much larger number of unclassified observations. This is
because the procurement of classified data can be quite costly due to high
acquisition costs and subsequent financial, time, and ethical issues that can
arise in attempts to provide the true class labels for the unclassified data
that have been acquired. We provide here a review of statistical SSL approaches
to this problem, focussing on the recent result that a classifier formed from a
partially classified sample can actually have smaller expected error rate than
that if the sample were completely classified.
Related papers
- Selective Classification Under Distribution Shifts [2.6541808384534478]
In selective classification, a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors.
We propose an SC framework that takes into account distribution shifts.
We show that our proposed score functions are more effective and reliable than the existing ones for generalized SC.
arXiv Detail & Related papers (2024-05-08T15:52:50Z) - Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning.
This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z) - Generalization Bounds for Few-Shot Transfer Learning with Pretrained
Classifiers [26.844410679685424]
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes.
We show that the few-shot error of the learned feature map on new classes is small in case of class-feature-variability collapse.
arXiv Detail & Related papers (2022-12-23T18:46:05Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Class-Imbalanced Complementary-Label Learning via Weighted Loss [8.934943507699131]
Complementary-label learning (CLL) is widely used in weakly supervised classification.
It faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples.
We propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification.
arXiv Detail & Related papers (2022-09-28T16:02:42Z) - PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label
Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training.
We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z) - Complementing Semi-Supervised Learning with Uncertainty Quantification [6.612035830987296]
We propose a novel unsupervised uncertainty-aware objective that relies on aleatoric and epistemic uncertainty quantification.
Our results outperform the state-of-the-art results on complex datasets such as CIFAR-100 and Mini-ImageNet.
arXiv Detail & Related papers (2022-07-22T00:15:02Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - Learning from Multiple Unlabeled Datasets with Partial Risk
Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z) - Radio Galaxy Zoo: Using semi-supervised learning to leverage large
unlabelled data-sets for radio galaxy classification under data-set shift [0.0]
State-of-the-art semi-supervised learning algorithm applied to morphological classification of radio galaxies.
We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art.
Improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes.
arXiv Detail & Related papers (2022-04-19T11:38:22Z) - A Boundary Based Out-of-Distribution Classifier for Generalized
Zero-Shot Learning [83.1490247844899]
Generalized Zero-Shot Learning (GZSL) is a challenging topic that has promising prospects in many realistic scenarios.
We propose a boundary based Out-of-Distribution (OOD) classifier which classifies the unseen and seen domains by only using seen samples for training.
We extensively validate our approach on five popular benchmark datasets including AWA1, AWA2, CUB, FLO and SUN.
arXiv Detail & Related papers (2020-08-09T11:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.