Dataset Cleaning -- A Cross Validation Methodology for Large Facial
Datasets using Face Recognition
- URL: http://arxiv.org/abs/2003.10815v1
- Date: Tue, 24 Mar 2020 13:01:13 GMT
- Title: Dataset Cleaning -- A Cross Validation Methodology for Large Facial
Datasets using Face Recognition
- Authors: Viktor Varkarakis, Peter Corcoran
- Abstract summary: In recent years, large "in the wild" face datasets have been released in an attempt to facilitate progress in tasks such as face detection, face recognition, and other tasks.
Due to the automatic way of gathering these datasets and due to their large size, many identities folder contain mislabeled samples which deteriorates the quality of the datasets.
In this work, it is presented a semi-automatic method for cleaning the noisy large face datasets with the use of face recognition.
- Score: 0.40611352512781856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, large "in the wild" face datasets have been released in an
attempt to facilitate progress in tasks such as face detection, face
recognition, and other tasks. Most of these datasets are acquired from webpages
with automatic procedures. As a consequence, noisy data are often found.
Furthermore, in these large face datasets, the annotation of identities is
important as they are used for training face recognition algorithms. But due to
the automatic way of gathering these datasets and due to their large size, many
identities folder contain mislabeled samples which deteriorates the quality of
the datasets. In this work, it is presented a semi-automatic method for
cleaning the noisy large face datasets with the use of face recognition. This
methodology is applied to clean the CelebA dataset and show its effectiveness.
Furthermore, the list with the mislabelled samples in the CelebA dataset is
made available.
Related papers
- Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method [77.65459419417533]
We put face forgery in a semantic context and define that computational methods that alter semantic face attributes are sources of face forgery.
We construct a large face forgery image dataset, where each image is associated with a set of labels organized in a hierarchical graph.
We propose a semantics-oriented face forgery detection method that captures label relations and prioritizes the primary task.
arXiv Detail & Related papers (2024-05-14T10:24:19Z) - DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis [71.40724659748787]
DiffusionFace is the first diffusion-based face forgery dataset.
It covers various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms.
It provides essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation.
arXiv Detail & Related papers (2024-03-27T11:32:44Z) - Attribute-preserving Face Dataset Anonymization via Latent Code
Optimization [64.4569739006591]
We present a task-agnostic anonymization procedure that directly optimize the images' latent representation in the latent space of a pre-trained GAN.
We demonstrate through a series of experiments that our method is capable of anonymizing the identity of the images whilst -- crucially -- better-preserving the facial attributes.
arXiv Detail & Related papers (2023-03-20T17:34:05Z) - How to Boost Face Recognition with StyleGAN? [13.067766076889995]
State-of-the-art face recognition systems require vast amounts of labeled training data.
Self-supervised revolution in the industry motivates research on the adaptation of related techniques to facial recognition.
We show that a simple approach based on fine-tuning pSp encoder for StyleGAN allows us to improve upon the state-of-the-art facial recognition.
arXiv Detail & Related papers (2022-10-18T18:41:56Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - How important are faces for person re-identification? [14.718372669984364]
We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets.
We evaluate the effect of this anonymization on re-identification performance using standard metrics.
arXiv Detail & Related papers (2020-10-13T11:47:16Z) - Masked Face Recognition for Secure Authentication [2.429066522170765]
Masked faces make it difficult to be detected and recognized, thereby threatening to make the in-house datasets invalid.
We present an open-source tool, MaskTheFace to mask faces effectively creating a large dataset of masked faces.
We report an increase of 38% in the true positive rate for the Facenet system.
arXiv Detail & Related papers (2020-08-25T15:33:59Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - A Method for Curation of Web-Scraped Face Image Datasets [13.893682217746816]
A variety of issues occur when collecting a dataset in-the-wild.
With the number of images being in the millions, a manual cleaning procedure is not feasible.
We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods.
arXiv Detail & Related papers (2020-04-07T01:57:32Z) - Boosting Unconstrained Face Recognition with Auxiliary Unlabeled Data [59.85605718477639]
We present an approach to use unlabeled faces to learn generalizable face representations.
Experimental results on unconstrained datasets show that a small amount of unlabeled data with sufficient diversity can lead to an appreciable gain in recognition performance.
arXiv Detail & Related papers (2020-03-17T20:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.