Fix-A-Step: Semi-supervised Learning from Uncurated Unlabeled Data
- URL: http://arxiv.org/abs/2208.11870v3
- Date: Thu, 25 May 2023 18:23:40 GMT
- Title: Fix-A-Step: Semi-supervised Learning from Uncurated Unlabeled Data
- Authors: Zhe Huang, Mary-Joy Sidhom, Benjamin S. Wessler, Michael C. Hughes
- Abstract summary: In real applications like medical imaging, unlabeled data will be collected for expediency and thus uncurated.
We introduce Fix-A-Step, a procedure that views all uncurated unlabeled images as potentially helpful.
On a new medical SSL benchmark called Heart2Heart, Fix-A-Step can learn from 353,500 truly uncurated ultrasound images to deliver gains that generalize across hospitals.
- Score: 4.779633174910461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning (SSL) promises improved accuracy compared to
training classifiers on small labeled datasets by also training on many
unlabeled images. In real applications like medical imaging, unlabeled data
will be collected for expediency and thus uncurated: possibly different from
the labeled set in classes or features. Unfortunately, modern deep SSL often
makes accuracy worse when given uncurated unlabeled data. Recent complex
remedies try to detect out-of-distribution unlabeled images and then discard or
downweight them. Instead, we introduce Fix-A-Step, a simpler procedure that
views all uncurated unlabeled images as potentially helpful. Our first insight
is that even uncurated images can yield useful augmentations of labeled data.
Second, we modify gradient descent updates to prevent optimizing a multi-task
SSL loss from hurting labeled-set accuracy. Fix-A-Step can repair many common
deep SSL methods, improving accuracy on CIFAR benchmarks across all tested
methods and levels of artificial class mismatch. On a new medical SSL benchmark
called Heart2Heart, Fix-A-Step can learn from 353,500 truly uncurated
ultrasound images to deliver gains that generalize across hospitals.
Related papers
- FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Semi-supervised Pathological Image Segmentation via Cross Distillation
of Multiple Attentions [19.236045479697797]
We propose a novel Semi-Supervised Learning (SSL) method based on Cross Distillation of Multiple Attentions (CDMA)
Our proposed CDMA was compared with eight state-of-the-art SSL methods on the public DigestPath dataset.
arXiv Detail & Related papers (2023-05-30T08:23:07Z) - On Non-Random Missing Labels in Semi-Supervised Learning [114.62655062520425]
Semi-Supervised Learning (SSL) is fundamentally a missing label problem.
We explicitly incorporate "class" into SSL.
Our method not only significantly outperforms existing baselines but also surpasses other label bias removal SSL methods.
arXiv Detail & Related papers (2022-06-29T22:01:29Z) - Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient.
SSL with deep models has proven to be successful on standard benchmark tasks.
However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z) - ACPL: Anti-curriculum Pseudo-labelling forSemi-supervised Medical Image
Classification [22.5935068122522]
We propose a new SSL algorithm, called anti-curriculum pseudo-labelling (ACPL)
ACPL introduces novel techniques to select informative unlabelled samples, improving training balance and allowing the model to work for both multi-label and multi-class problems.
Our method outperforms previous SOTA SSL methods on both datasets.
arXiv Detail & Related papers (2021-11-25T05:31:52Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Information Bottleneck Constrained Latent Bidirectional Embedding for
Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint.
We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces.
Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z) - FixMatch: Simplifying Semi-Supervised Learning with Consistency and
Confidence [93.91751021370638]
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance.
In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling.
Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images.
arXiv Detail & Related papers (2020-01-21T18:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.