Iterative Pseudo-Labeling with Deep Feature Annotation and
Confidence-Based Sampling
- URL: http://arxiv.org/abs/2109.02717v1
- Date: Mon, 6 Sep 2021 20:02:13 GMT
- Title: Iterative Pseudo-Labeling with Deep Feature Annotation and
Confidence-Based Sampling
- Authors: Barbara C Benato and Alexandru C Telea and Alexandre X Falc\~ao
- Abstract summary: Training deep neural networks is challenging when large and annotated datasets are unavailable.
We improve a recent iterative pseudo-labeling technique, Deep Feature, by selecting the most confident unsupervised samples to iteratively train a deep neural network.
We first ascertain the best configuration for the baseline -- a self-trained deep neural network -- and then evaluate our confidence DeepFA for different confidence thresholds.
- Score: 127.46527972920383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep neural networks is challenging when large and annotated
datasets are unavailable. Extensive manual annotation of data samples is
time-consuming, expensive, and error-prone, notably when it needs to be done by
experts. To address this issue, increased attention has been devoted to
techniques that propagate uncertain labels (also called pseudo labels) to large
amounts of unsupervised samples and use them for training the model. However,
these techniques still need hundreds of supervised samples per class in the
training set and a validation set with extra supervised samples to tune the
model. We improve a recent iterative pseudo-labeling technique, Deep Feature
Annotation (DeepFA), by selecting the most confident unsupervised samples to
iteratively train a deep neural network. Our confidence-based sampling strategy
relies on only dozens of annotated training samples per class with no
validation set, considerably reducing user effort in data annotation. We first
ascertain the best configuration for the baseline -- a self-trained deep neural
network -- and then evaluate our confidence DeepFA for different confidence
thresholds. Experiments on six datasets show that DeepFA already outperforms
the self-trained baseline, but confidence DeepFA can considerably outperform
the original DeepFA and the baseline.
Related papers
- Are Sparse Neural Networks Better Hard Sample Learners? [24.2141078613549]
Hard samples play a crucial role in the optimal performance of deep neural networks.
Most SNNs trained on challenging samples can often match or surpass dense models in accuracy at certain sparsity levels.
arXiv Detail & Related papers (2024-09-13T21:12:18Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers [21.741026088202126]
We propose a novel way to certify the robustness of pretrained models using only a few training samples.
Our proposed approach generates class-boundary and interpolated samples corresponding to each training sample.
We obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.
arXiv Detail & Related papers (2022-10-17T10:41:18Z) - Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint
Localization [88.74813798138466]
Localizing keypoints of an object is a basic visual problem.
Supervised learning of a keypoint localization network often requires a large amount of data.
We propose to automatically select reliable pseudo-labeled samples with a series of dynamic thresholds.
arXiv Detail & Related papers (2022-01-21T09:51:58Z) - Semi-supervised deep learning based on label propagation in a 2D
embedded space [117.9296191012968]
Proposed solutions propagate labels from a small set of supervised images to a large set of unsupervised ones to train a deep neural network model.
We present a loop in which a deep neural network (VGG-16) is trained from a set with more correctly labeled samples along iterations.
As the labeled set improves along iterations, it improves the features of the neural network.
arXiv Detail & Related papers (2020-08-02T20:08:54Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.