Double Trouble? Impact and Detection of Duplicates in Face Image
Datasets
- URL: http://arxiv.org/abs/2401.14088v1
- Date: Thu, 25 Jan 2024 11:10:13 GMT
- Title: Double Trouble? Impact and Detection of Duplicates in Face Image
Datasets
- Authors: Torsten Schlett, Christian Rathgeb, Juan Tapia, Christoph Busch
- Abstract summary: Face image datasets intended for facial biometrics research were created via web-scraping.
This work presents an approach to detect both exactly and nearly identical face image duplicates.
- Score: 7.092869001331781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various face image datasets intended for facial biometrics research were
created via web-scraping, i.e. the collection of images publicly available on
the internet. This work presents an approach to detect both exactly and nearly
identical face image duplicates, using file and image hashes. The approach is
extended through the use of face image preprocessing. Additional steps based on
face recognition and face image quality assessment models reduce false
positives, and facilitate the deduplication of the face images both for intra-
and inter-subject duplicate sets. The presented approach is applied to five
datasets, namely LFW, TinyFace, Adience, CASIA-WebFace, and C-MS-Celeb (a
cleaned MS-Celeb-1M variant). Duplicates are detected within every dataset,
with hundreds to hundreds of thousands of duplicates for all except LFW. Face
recognition and quality assessment experiments indicate a minor impact on the
results through the duplicate removal. The final deduplication data is publicly
available.
Related papers
- OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection [0.0]
This study focuses on image processing-based forgery detection using Fake-Vs-Real-Faces (Hard) [10] and 140k Real and Fake Faces [61] data sets.
Two lightweight deep learning models are proposed to conduct forgery detection using these images.
It's shown that the proposed lightweight deep learning models detect forgeries of facial imagery accurately, and computationally efficiently.
arXiv Detail & Related papers (2024-11-18T18:44:10Z) - Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method [77.65459419417533]
We put face forgery in a semantic context and define that computational methods that alter semantic face attributes are sources of face forgery.
We construct a large face forgery image dataset, where each image is associated with a set of labels organized in a hierarchical graph.
We propose a semantics-oriented face forgery detection method that captures label relations and prioritizes the primary task.
arXiv Detail & Related papers (2024-05-14T10:24:19Z) - DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis [71.40724659748787]
DiffusionFace is the first diffusion-based face forgery dataset.
It covers various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms.
It provides essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation.
arXiv Detail & Related papers (2024-03-27T11:32:44Z) - Arc2Face: A Foundation Model for ID-Consistent Human Faces [95.00331107591859]
Arc2Face is an identity-conditioned face foundation model.
It can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.
arXiv Detail & Related papers (2024-03-18T10:32:51Z) - FACE-AUDITOR: Data Auditing in Facial Recognition Systems [24.082527732931677]
Few-shot-based facial recognition systems have gained increasing attention due to their scalability and ability to work with a few face images.
To prevent the face images from being misused, one straightforward approach is to modify the raw face images before sharing them.
We propose a complete toolkit FACE-AUDITOR that can query the few-shot-based facial recognition model and determine whether any of a user's face images is used in training the model.
arXiv Detail & Related papers (2023-04-05T23:03:54Z) - FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders [81.21440457805932]
We propose a novel framework FaceMAE, where the face privacy and recognition performance are considered simultaneously.
randomly masked face images are used to train the reconstruction module in FaceMAE.
We also perform sufficient privacy-preserving face recognition on several public face datasets.
arXiv Detail & Related papers (2022-05-23T07:19:42Z) - Reliable Detection of Doppelg\"angers based on Deep Face Representations [14.832145647643848]
We assess the impact of doppelg"angers on the HDA Doppelg"anger and Disguised Faces in The Wild databases.
It is found that doppelg"anger image pairs yield very high similarity scores resulting in a significant increase of false match rates.
We propose a doppelg"anger detection method which distinguishes doppelg"angers from mated comparison trials.
arXiv Detail & Related papers (2022-01-21T18:37:08Z) - End2End Occluded Face Recognition by Masking Corrupted Features [82.27588990277192]
State-of-the-art general face recognition models do not generalize well to occluded face images.
This paper presents a novel face recognition method that is robust to occlusions based on a single end-to-end deep neural network.
Our approach, named FROM (Face Recognition with Occlusion Masks), learns to discover the corrupted features from the deep convolutional neural networks, and clean them by the dynamically learned masks.
arXiv Detail & Related papers (2021-08-21T09:08:41Z) - When Face Recognition Meets Occlusion: A New Benchmark [37.616211206620854]
We create a simulated occlusion face recognition dataset.
It covers 804,704 face images of 10,575 subjects.
Our dataset significantly outperforms the state-of-the-arts.
arXiv Detail & Related papers (2021-03-04T03:07:42Z) - A Method for Curation of Web-Scraped Face Image Datasets [13.893682217746816]
A variety of issues occur when collecting a dataset in-the-wild.
With the number of images being in the millions, a manual cleaning procedure is not feasible.
We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods.
arXiv Detail & Related papers (2020-04-07T01:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.