Related papers: Semi-Supervised Learning of Classifiers from a Statistical Perspective: A Brief Review

Semi-Supervised Learning of Classifiers from a Statistical Perspective: A Brief Review

URL: http://arxiv.org/abs/2104.04046v1
Date: Thu, 8 Apr 2021 20:41:57 GMT
Title: Semi-Supervised Learning of Classifiers from a Statistical Perspective: A Brief Review
Authors: Daniel Ahfock, Geoffrey J. McLachlan
Abstract summary: We provide here a review of statistical SSL approaches to forming a classifier. We focus on the recent result that a classifier formed from a partially classified sample can actually have smaller expected error rate than that if the sample were completely classified.
Score: 1.6752182911522517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There has been increasing attention to semi-supervised learning (SSL) approaches in machine learning to forming a classifier in situations where the training data for a classifier consists of a limited number of classified observations but a much larger number of unclassified observations. This is because the procurement of classified data can be quite costly due to high acquisition costs and subsequent financial, time, and ethical issues that can arise in attempts to provide the true class labels for the unclassified data that have been acquired. We provide here a review of statistical SSL approaches to this problem, focussing on the recent result that a classifier formed from a partially classified sample can actually have smaller expected error rate than that if the sample were completely classified.

Related papers

ScarceGAN: Discriminative Classification Framework for Rare Class Identification for Longitudinal Data with Weak Prior [4.2944491746735745]
ScarceGAN focuses on identification of extremely rare or scarce samples from longitudinal telemetry data with small and weak label prior.<n>We specifically address: (i) severe scarcity in positive class, stemming from both underlying organic skew in the data, as well as extremely limited labels.<n>For identifying risky players in skill gaming, this formulation in whole gives us a recall of over 85% (60% jump over vanilla semi-supervised GAN) on our scarce class with very minimal verbosity in the unknown space.
arXiv Detail & Related papers (2025-05-02T12:17:37Z)
Selective Classification Under Distribution Shifts [2.6541808384534478]
In selective classification, a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. We propose an SC framework that takes into account distribution shifts. We show that our proposed score functions are more effective and reliable than the existing ones for generalized SC.
arXiv Detail & Related papers (2024-05-08T15:52:50Z)
Exploring Vacant Classes in Label-Skewed Federated Learning [113.65301899666645]
Label skews, characterized by disparities in local label distribution across clients, pose a significant challenge in federated learning. This paper introduces FedVLS, a novel approach to label-skewed federated learning that integrates vacant-class distillation and logit suppression simultaneously.
arXiv Detail & Related papers (2024-01-04T16:06:31Z)
Generalization Bounds for Few-Shot Transfer Learning with Pretrained Classifiers [26.844410679685424]
We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. We show that the few-shot error of the learned feature map on new classes is small in case of class-feature-variability collapse.
arXiv Detail & Related papers (2022-12-23T18:46:05Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Class-Imbalanced Complementary-Label Learning via Weighted Loss [8.934943507699131]
Complementary-label learning (CLL) is widely used in weakly supervised classification. It faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples. We propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification.
arXiv Detail & Related papers (2022-09-28T16:02:42Z)
PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training. We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z)
Complementing Semi-Supervised Learning with Uncertainty Quantification [6.612035830987296]
We propose a novel unsupervised uncertainty-aware objective that relies on aleatoric and epistemic uncertainty quantification. Our results outperform the state-of-the-art results on complex datasets such as CIFAR-100 and Mini-ImageNet.
arXiv Detail & Related papers (2022-07-22T00:15:02Z)
OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning. Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data. This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z)
Learning from Multiple Unlabeled Datasets with Partial Risk Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels. We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets. We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training. Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z)
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift [0.0]
State-of-the-art semi-supervised learning algorithm applied to morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art. Improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes.
arXiv Detail & Related papers (2022-04-19T11:38:22Z)
A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning [83.1490247844899]
Generalized Zero-Shot Learning (GZSL) is a challenging topic that has promising prospects in many realistic scenarios. We propose a boundary based Out-of-Distribution (OOD) classifier which classifies the unseen and seen domains by only using seen samples for training. We extensively validate our approach on five popular benchmark datasets including AWA1, AWA2, CUB, FLO and SUN.
arXiv Detail & Related papers (2020-08-09T11:27:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.