Radio Galaxy Zoo: Using semi-supervised learning to leverage large
unlabelled data-sets for radio galaxy classification under data-set shift
- URL: http://arxiv.org/abs/2204.08816v3
- Date: Thu, 21 Apr 2022 10:24:08 GMT
- Title: Radio Galaxy Zoo: Using semi-supervised learning to leverage large
unlabelled data-sets for radio galaxy classification under data-set shift
- Authors: Inigo V. Slijepcevic, Anna M. M. Scaife, Mike Walmsley, Micah Bowles,
Ivy Wong, Stanislav S. Shabala and Hongming Tang
- Abstract summary: State-of-the-art semi-supervised learning algorithm applied to morphological classification of radio galaxies.
We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art.
Improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we examine the classification accuracy and robustness of a
state-of-the-art semi-supervised learning (SSL) algorithm applied to the
morphological classification of radio galaxies. We test if SSL with fewer
labels can achieve test accuracies comparable to the supervised
state-of-the-art and whether this holds when incorporating previously unseen
data. We find that for the radio galaxy classification problem considered, SSL
provides additional regularisation and outperforms the baseline test accuracy.
However, in contrast to model performance metrics reported on computer science
benchmarking data-sets, we find that improvement is limited to a narrow range
of label volumes, with performance falling off rapidly at low label volumes.
Additionally, we show that SSL does not improve model calibration, regardless
of whether classification is improved. Moreover, we find that when different
underlying catalogues drawn from the same radio survey are used to provide the
labelled and unlabelled data-sets required for SSL, a significant drop in
classification performance is observered, highlighting the difficulty of
applying SSL techniques under dataset shift. We show that a class-imbalanced
unlabelled data pool negatively affects performance through prior probability
shift, which we suggest may explain this performance drop, and that using the
Frechet Distance between labelled and unlabelled data-sets as a measure of
data-set shift can provide a prediction of model performance, but that for
typical radio galaxy data-sets with labelled sample volumes of O(1000), the
sample variance associated with this technique is high and the technique is in
general not sufficiently robust to replace a train-test cycle.
Related papers
- Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning [6.904448748214652]
Semi-supervised learning algorithms struggle to perform well when exposed to imbalanced training data.
We introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL)
SEVAL adapts to specific tasks with improved pseudo-labels accuracy and ensures pseudo-labels correctness on a per-class basis.
arXiv Detail & Related papers (2024-07-07T13:46:22Z) - Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation [87.17768598044427]
Traditional semi-supervised learning assumes that the feature distributions of labeled and unlabeled data are consistent.
We propose Self-Supervised Feature Adaptation (SSFA), a generic framework for improving SSL performance when labeled and unlabeled data come from different distributions.
Our proposed SSFA is applicable to various pseudo-label-based SSL learners and significantly improves performance in labeled, unlabeled, and even unseen distributions.
arXiv Detail & Related papers (2024-05-31T03:13:45Z) - Semi-Supervised Learning in the Few-Shot Zero-Shot Scenario [14.916971861796384]
Semi-Supervised Learning (SSL) is a framework that utilizes both labeled and unlabeled data to enhance model performance.
We propose a general approach to augment existing SSL methods, enabling them to handle situations where certain classes are missing.
Our experimental results reveal significant improvements in accuracy when compared to state-of-the-art SSL, open-set SSL, and open-world SSL methods.
arXiv Detail & Related papers (2023-08-27T14:25:07Z) - Complementing Semi-Supervised Learning with Uncertainty Quantification [6.612035830987296]
We propose a novel unsupervised uncertainty-aware objective that relies on aleatoric and epistemic uncertainty quantification.
Our results outperform the state-of-the-art results on complex datasets such as CIFAR-100 and Mini-ImageNet.
arXiv Detail & Related papers (2022-07-22T00:15:02Z) - ADT-SSL: Adaptive Dual-Threshold for Semi-Supervised Learning [68.53717108812297]
Semi-Supervised Learning (SSL) has advanced classification tasks by inputting both labeled and unlabeled data to train a model jointly.
This paper proposes an Adaptive Dual-Threshold method for Semi-Supervised Learning (ADT-SSL)
Experimental results show that the proposed ADT-SSL achieves state-of-the-art classification accuracy.
arXiv Detail & Related papers (2022-05-21T11:52:08Z) - Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient.
SSL with deep models has proven to be successful on standard benchmark tasks.
However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z) - Can semi-supervised learning reduce the amount of manual labelling
required for effective radio galaxy morphology classification? [0.0]
We test whether SSL can achieve performance comparable to the current supervised state of the art when using many fewer labelled data points.
We find that although SSL provides additional regularisation, its performance degrades rapidly when using very few labels.
arXiv Detail & Related papers (2021-11-08T09:36:48Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.