Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant
Collection of Face Images for Various Classification Tasks
- URL: http://arxiv.org/abs/2311.11882v1
- Date: Mon, 20 Nov 2023 16:19:46 GMT
- Title: Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant
Collection of Face Images for Various Classification Tasks
- Authors: Rami Haffar, David S\'anchez, and Josep Domingo-Ferrer
- Abstract summary: Recent privacy regulations have restricted the ways in which human images may be collected and used for research.
Several previously published data sets containing human faces have been removed from the internet due to inadequate data collection methods.
We present the Multi-Task Faces (MTF) image data set, a meticulously curated collection of face images designed for various classification tasks.
- Score: 3.1133049660590615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human facial data hold tremendous potential to address a variety of
classification problems, including face recognition, age estimation, gender
identification, emotion analysis, and race classification. However, recent
privacy regulations, such as the EU General Data Protection Regulation and
others, have restricted the ways in which human images may be collected and
used for research. As a result, several previously published data sets
containing human faces have been removed from the internet due to inadequate
data collection methods that failed to meet privacy regulations. Data sets
consisting of synthetic data have been proposed as an alternative, but they
fall short of accurately representing the real data distribution. On the other
hand, most available data sets are labeled for just a single task, which limits
their applicability. To address these issues, we present the Multi-Task Faces
(MTF) image data set, a meticulously curated collection of face images designed
for various classification tasks, including face recognition, as well as race,
gender, and age classification. The MTF data set has been ethically gathered by
leveraging publicly available images of celebrities and strictly adhering to
copyright regulations. In this paper, we present this data set and provide
detailed descriptions of the followed data collection and processing
procedures. Furthermore, we evaluate the performance of five deep learning (DL)
models on the MTF data set across the aforementioned classification tasks.
Additionally, we compare the performance of DL models over the processed MTF
data and over raw data crawled from the internet. The reported results
constitute a baseline for further research employing these data. The MTF data
set can be accessed through the following link (please cite the present paper
if you use the data set): https://github.com/RamiHaf/MTF_data_set
Related papers
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes [14.966767182001755]
We propose a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity.
Specifically, our approach integrates demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories.
These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images.
arXiv Detail & Related papers (2024-04-26T08:51:31Z) - DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis [71.40724659748787]
DiffusionFace is the first diffusion-based face forgery dataset.
It covers various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms.
It provides essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation.
arXiv Detail & Related papers (2024-03-27T11:32:44Z) - Disguise without Disruption: Utility-Preserving Face De-Identification [40.484745636190034]
We introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data.
Our method involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility.
We extensively evaluate our method using multiple datasets, demonstrating a higher de-identification rate and superior consistency compared to prior approaches in various downstream tasks.
arXiv Detail & Related papers (2023-03-23T13:50:46Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Assessing Demographic Bias Transfer from Dataset to Model: A Case Study
in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model.
We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z) - EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale
Dataset [92.537021496096]
Recent deep face hallucination methods show stunning performance in super-resolving severely degraded facial images.
It is thus unclear how these algorithms perform on public face hallucination datasets.
This paper builds a public Ethnically Diverse Face dataset, EDFace-Celeb-1M, and design a benchmark task for face hallucination.
arXiv Detail & Related papers (2021-10-11T06:53:24Z) - Personalized Image Semantic Segmentation [58.980245748434]
We generate more accurate segmentation results on unlabeled personalized images by investigating the data's personalized traits.
We propose a baseline method that incorporates the inter-image context when segmenting certain images.
The code and the PIS dataset will be made publicly available.
arXiv Detail & Related papers (2021-07-24T04:03:11Z) - Reducing bias and increasing utility by federated generative modeling of
medical images using a centralized adversary [10.809871958865447]
We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning.
We show how a data owner with limited and biased data could benefit from other data owners while keeping data from all the sources private.
This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside local premises.
arXiv Detail & Related papers (2021-01-18T18:40:46Z) - Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race.
We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns.
We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z) - A Method for Curation of Web-Scraped Face Image Datasets [13.893682217746816]
A variety of issues occur when collecting a dataset in-the-wild.
With the number of images being in the millions, a manual cleaning procedure is not feasible.
We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods.
arXiv Detail & Related papers (2020-04-07T01:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.