Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets
- URL: http://arxiv.org/abs/2205.06218v1
- Date: Thu, 12 May 2022 17:03:57 GMT
- Title: Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets
- Authors: Kenny T. R. Voo, Liming Jiang, Chen Change Loy
- Abstract summary: We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
- Score: 83.749895930242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper performs comprehensive analysis on datasets for occlusion-aware
face segmentation, a task that is crucial for many downstream applications. The
collection and annotation of such datasets are time-consuming and
labor-intensive. Although some efforts have been made in synthetic data
generation, the naturalistic aspect of data remains less explored. In our
study, we propose two occlusion generation techniques, Naturalistic Occlusion
Generation (NatOcc), for producing high-quality naturalistic synthetic occluded
faces; and Random Occlusion Generation (RandOcc), a more general synthetic
occluded data generation method. We empirically show the effectiveness and
robustness of both methods, even for unseen occlusions. To facilitate model
evaluation, we present two high-resolution real-world occluded face datasets
with fine-grained annotations, RealOcc and RealOcc-Wild, featuring both careful
alignment preprocessing and an in-the-wild setting for robustness test. We
further conduct a comprehensive analysis on a newly introduced segmentation
benchmark, offering insights for future exploration.
Related papers
- SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data [44.304022773272415]
We introduce SynFER, a novel framework for synthesizing facial expression image data based on high-level textual descriptions.
We propose a semantic guidance technique to steer the generation process and a pseudo-label generator to help rectify the facial expression labels.
Our approach achieves a 67.23% classification accuracy on AffectNet when training solely with synthetic data equivalent to the AffectNet training set size.
arXiv Detail & Related papers (2024-10-13T14:58:21Z) - Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion [20.352548473293993]
Face Recognition (FR) models are trained on large-scale datasets, which have privacy and ethical concerns.
Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed.
We introduce a new method, inspired by the physical motion of soft particles subjected to Brownian forces, allowing us to sample identities in a latent space under various constraints.
With this in hands, we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-
arXiv Detail & Related papers (2024-04-30T22:32:02Z) - View-Dependent Octree-based Mesh Extraction in Unbounded Scenes for
Procedural Synthetic Data [71.22495169640239]
Procedural signed distance functions (SDFs) are a powerful tool for modeling large-scale detailed scenes.
We propose OcMesher, a mesh extraction algorithm that efficiently handles high-detail unbounded scenes with perfect view-consistency.
arXiv Detail & Related papers (2023-12-13T18:56:13Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Statistical properties and privacy guarantees of an original
distance-based fully synthetic data generation method [0.0]
This work shows the technical feasibility of generating publicly releasable synthetic data using a multi-step framework.
By successfully assessing the quality of data produced using a novel multi-step synthetic data generation framework, we showed the technical and conceptual soundness of the Open-CESP initiative.
arXiv Detail & Related papers (2023-10-10T12:29:57Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - SYNC: A Copula based Framework for Generating Synthetic Data from
Aggregated Sources [8.350531869939351]
We study synthetic data generation task called downscaling.
We propose a multi-stage framework called SYNC (Synthetic Data Generation via Gaussian Copula)
We make four key contributions in this work.
arXiv Detail & Related papers (2020-09-20T16:36:25Z) - Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset.
With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset.
In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.