Related papers: Accurately Classifying Out-Of-Distribution Data in Facial Recognition

Accurately Classifying Out-Of-Distribution Data in Facial Recognition

URL: http://arxiv.org/abs/2404.03876v3
Date: Tue, 25 Jun 2024 02:20:06 GMT
Title: Accurately Classifying Out-Of-Distribution Data in Facial Recognition
Authors: Gianluca Barone, Aashrit Cunchala, Rudy Nunez,
Abstract summary: Real-life scenarios typically feature unseen data which is different from data in the training distribution. This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data?
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard classification theory assumes that the distribution of images in the test and training sets are identical. Unfortunately, real-life scenarios typically feature unseen data ("out-of-distribution data") which is different from data in the training distribution("in-distribution"). This issue is most prevalent in social justice problems where data from under-represented groups may appear in the test data without representing an equal proportion of the training data. This may result in a model returning confidently wrong decisions and predictions. We are interested in the following question: Can the performance of a neural network improve on facial images of out-of-distribution data when it is trained simultaneously on multiple datasets of in-distribution data? We approach this problem by incorporating the Outlier Exposure model and investigate how the model's performance changes when other datasets of facial images were implemented. We observe that the accuracy and other metrics of the model can be increased by applying Outlier Exposure, incorporating a trainable weight parameter to increase the machine's emphasis on outlier images, and by re-weighting the importance of different class labels. We also experimented with whether sorting the images and determining outliers via image features would have more of an effect on the metrics than sorting by average pixel value. Our goal was to make models not only more accurate but also more fair by scanning a more expanded range of images. We also tested the datasets in reverse order to see whether a more fair dataset with balanced features has an effect on the model's accuracy.

Related papers

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets. Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize. Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z)
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation [7.183341902583164]
In this work, we analyze the role played by the label augmentation aspect of data augmentation methods. We first prove that linear models on binary classification data trained with label augmentation learn only the minimum variance features in the data. We then use our techniques to show that even for nonlinear models and general data distributions, the label smoothing and Mixup losses are lower bounded by a function of the model output variance.
arXiv Detail & Related papers (2024-02-10T01:36:39Z)
DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets [4.833815605196965]
This paper presents a novel method for addressing data imbalance in machine learning. It computes sample likelihoods based on image appearance using deep perceptual embeddings and clustering. It then uses these likelihoods to weigh samples differently during training with a proposed $bfGeneralized Focal Loss$ function.
arXiv Detail & Related papers (2023-08-19T02:11:49Z)
Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Example-Based Explainable AI and its Application for Remote Sensing Image Classification [0.0]
We show an example of an instance in a training dataset that is similar to the input data to be inferred. Using a remote sensing image dataset from the Sentinel-2 satellite, the concept was successfully demonstrated.
arXiv Detail & Related papers (2023-02-03T03:48:43Z)
Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases [62.54519787811138]
We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. We rank images within their classes based on spuriosity, proxied via deep neural features of an interpretable network. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.
arXiv Detail & Related papers (2022-12-05T23:15:43Z)
Assessing Dataset Bias in Computer Vision [0.0]
biases have the tendency to propagate to the models that train on them, often leading to a poor performance in the minority class. We will apply several augmentation techniques on a sample of the UTKFace dataset, such as undersampling, geometric transformations, variational autoencoders (VAEs), and generative adversarial networks (GANs) We were able to show that our model has a better overall performance and consistency on age and ethnicity classification on multiple datasets when compared with the FairFace model.
arXiv Detail & Related papers (2022-05-03T22:45:49Z)
Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations. We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z)
Understanding Gender and Racial Disparities in Image Recognition Models [0.0]
We investigate a multi-label softmax loss with cross-entropy as the loss function instead of a binary cross-entropy on a multi-label classification problem. We use the MR2 dataset to evaluate the fairness in the model outcomes and try to interpret the mistakes by looking at model activations and suggest possible fixes.
arXiv Detail & Related papers (2021-07-20T01:05:31Z)
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases. Our method trains a lower capacity model in an ensemble with a higher capacity model. We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.