Accurately Classifying Out-Of-Distribution Data in Facial Recognition
- URL: http://arxiv.org/abs/2404.03876v3
- Date: Tue, 25 Jun 2024 02:20:06 GMT
- Title: Accurately Classifying Out-Of-Distribution Data in Facial Recognition
- Authors: Gianluca Barone, Aashrit Cunchala, Rudy Nunez,
- Abstract summary: Real-life scenarios typically feature unseen data which is different from data in the training distribution.
This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data.
We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data?
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data ("out-of-distribution data") which is different from data in the training distribution("in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. We also tested the datasets in reverse order to see whether a more fair dataset with balanced features has an effect on the model's accuracy.
Related papers
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Fair GANs through model rebalancing for extremely imbalanced class
distributions [5.463417677777276]
We present an approach to construct an unbiased generative adversarial network (GAN) from an existing biased GAN.
We show results for the StyleGAN2 models while training on the Flickr Faces High Quality (FFHQ) dataset for racial fairness.
We further validate our approach by applying it to an imbalanced CIFAR10 dataset which is also twice as large.
arXiv Detail & Related papers (2023-08-16T19:20:06Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Leaving Reality to Imagination: Robust Classification via Generated
Datasets [24.411444438920988]
Recent research on robustness has revealed significant performance gaps between neural image classifiers trained on datasets similar to the test set.
We study the question: How do generated datasets influence the natural robustness of image classifiers?
We find that Imagenet classifiers trained on real data augmented with generated data achieve higher accuracy and effective robustness than standard training.
arXiv Detail & Related papers (2023-02-05T22:49:33Z) - Example-Based Explainable AI and its Application for Remote Sensing
Image Classification [0.0]
We show an example of an instance in a training dataset that is similar to the input data to be inferred.
Using a remote sensing image dataset from the Sentinel-2 satellite, the concept was successfully demonstrated.
arXiv Detail & Related papers (2023-02-03T03:48:43Z) - Assessing Dataset Bias in Computer Vision [0.0]
biases have the tendency to propagate to the models that train on them, often leading to a poor performance in the minority class.
We will apply several augmentation techniques on a sample of the UTKFace dataset, such as undersampling, geometric transformations, variational autoencoders (VAEs), and generative adversarial networks (GANs)
We were able to show that our model has a better overall performance and consistency on age and ethnicity classification on multiple datasets when compared with the FairFace model.
arXiv Detail & Related papers (2022-05-03T22:45:49Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Towards Accuracy-Fairness Paradox: Adversarial Example-based Data
Augmentation for Visual Debiasing [15.689539491203373]
Machine learning fairness concerns about the biases towards certain protected or sensitive group of people when addressing the target tasks.
This paper studies the debiasing problem in the context of image classification tasks.
arXiv Detail & Related papers (2020-07-27T15:17:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.