Harnessing Unlabeled Data to Improve Generalization of Biometric Gender
and Age Classifiers
- URL: http://arxiv.org/abs/2110.04427v1
- Date: Sat, 9 Oct 2021 01:06:01 GMT
- Title: Harnessing Unlabeled Data to Improve Generalization of Biometric Gender
and Age Classifiers
- Authors: Aakash Varma Nadimpalli, Narsi Reddy, Sreeraj Ramachandran and Ajita
Rattani
- Abstract summary: Deep learning models need large amount of labeled data for model training and optimum parameter estimation.
Due to privacy and security concerns, the large amount of labeled data could not be collected for certain applications such as those involving medical field.
We propose self-ensemble based deep learning model that along with limited labeled data, harness unlabeled data for improving the generalization performance.
- Score: 0.7874708385247353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With significant advances in deep learning, many computer vision applications
have reached the inflection point. However, these deep learning models need
large amount of labeled data for model training and optimum parameter
estimation. Limited labeled data for model training results in over-fitting and
impacts their generalization performance. However, the collection and
annotation of large amount of data is a very time consuming and expensive
operation. Further, due to privacy and security concerns, the large amount of
labeled data could not be collected for certain applications such as those
involving medical field. Self-training, Co-training, and Self-ensemble methods
are three types of semi-supervised learning methods that can be used to exploit
unlabeled data. In this paper, we propose self-ensemble based deep learning
model that along with limited labeled data, harness unlabeled data for
improving the generalization performance. We evaluated the proposed
self-ensemble based deep-learning model for soft-biometric gender and age
classification. Experimental evaluation on CelebA and VISOB datasets suggest
gender classification accuracy of 94.46% and 81.00%, respectively, using only
1000 labeled samples and remaining 199k samples as unlabeled samples for CelebA
dataset and similarly,1000 labeled samples with remaining 107k samples as
unlabeled samples for VISOB dataset. Comparative evaluation suggest that there
is $5.74\%$ and $8.47\%$ improvement in the accuracy of the self-ensemble model
when compared with supervised model trained on the entire CelebA and VISOB
dataset, respectively. We also evaluated the proposed learning method for
age-group prediction on Adience dataset and it outperformed the baseline
supervised deep-learning learning model with a better exact accuracy of 55.55
$\pm$ 4.28 which is 3.92% more than the baseline.
Related papers
- A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - Uncertainty-Aware Semi-Supervised Learning for Prostate MRI Zonal
Segmentation [0.9176056742068814]
We propose a novel semi-supervised learning (SSL) approach that requires only a relatively small number of annotations.
Our method uses a pseudo-labeling technique that employs recent deep learning uncertainty estimation models.
Our proposed model outperformed the semi-supervised model in experiments with the ProstateX dataset and an external test set.
arXiv Detail & Related papers (2023-05-10T08:50:04Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Is margin all you need? An extensive empirical study of active learning
on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark.
Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z) - A Real Use Case of Semi-Supervised Learning for Mammogram Classification
in a Local Clinic of Costa Rica [0.5541644538483946]
Training a deep learning model requires a considerable amount of labeled images.
A number of publicly available datasets have been built with data from different hospitals and clinics.
The use of the semi-supervised deep learning approach known as MixMatch, to leverage the usage of unlabeled data is proposed and evaluated.
arXiv Detail & Related papers (2021-07-24T22:26:50Z) - Self Training with Ensemble of Teacher Models [8.257085583227695]
In order to train robust deep learning models, large amounts of labelled data is required.
In the absence of such large repositories of labelled data, unlabeled data can be exploited for the same.
Semi-Supervised learning aims to utilize such unlabeled data for training classification models.
arXiv Detail & Related papers (2021-07-17T09:44:09Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.