Related papers: Haven't I Seen You Before? Assessing Identity Leakage in Synthetic Irises

Haven't I Seen You Before? Assessing Identity Leakage in Synthetic Irises

URL: http://arxiv.org/abs/2211.05629v1
Date: Thu, 3 Nov 2022 00:34:47 GMT
Title: Haven't I Seen You Before? Assessing Identity Leakage in Synthetic Irises
Authors: Patrick Tinsley, Adam Czajka, Patrick Flynn
Abstract summary: This paper presents analysis for three different iris matchers at varying points in the GAN training process to diagnose where and when authentic training samples are in jeopardy of leaking through the generative process. Our results show that while most synthetic samples do not show signs of identity leakage, a handful of generated samples match authentic (training) samples nearly perfectly, with consensus across all matchers.
Score: 4.142375560633827
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Generative Adversarial Networks (GANs) have proven to be a preferred method of synthesizing fake images of objects, such as faces, animals, and automobiles. It is not surprising these models can also generate ISO-compliant, yet synthetic iris images, which can be used to augment training data for iris matchers and liveness detectors. In this work, we trained one of the most recent GAN models (StyleGAN3) to generate fake iris images with two primary goals: (i) to understand the GAN's ability to produce "never-before-seen" irises, and (ii) to investigate the phenomenon of identity leakage as a function of the GAN's training time. Previous work has shown that personal biometric data can inadvertently flow from training data into synthetic samples, raising a privacy concern for subjects who accidentally appear in the training dataset. This paper presents analysis for three different iris matchers at varying points in the GAN training process to diagnose where and when authentic training samples are in jeopardy of leaking through the generative process. Our results show that while most synthetic samples do not show signs of identity leakage, a handful of generated samples match authentic (training) samples nearly perfectly, with consensus across all matchers. In order to prioritize privacy, security, and trust in the machine learning model development process, the research community must strike a delicate balance between the benefits of using synthetic data and the corresponding threats against privacy from potential identity leakage.

Related papers

SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation [5.2438534296318355]
This study seeks to answer questions in PPDS: Can synthetic data effectively replace real data?<n>With a systematic evaluation of diverse methods, our study provides actionable insights into the utility-privacy tradeoffs of synthetic data generation methods.
arXiv Detail & Related papers (2025-06-24T06:41:34Z)
An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval [51.10419281315848]
We conduct an empirical study to explore the potential of synthetic data for Text-Based Person Retrieval (TBPR) research. We propose an inter-class image generation pipeline, in which an automatic prompt construction strategy is introduced. We develop an intra-class image augmentation pipeline, in which the generative AI models are applied to further edit the images.
arXiv Detail & Related papers (2025-03-28T06:18:15Z)
Leveraging Programmatically Generated Synthetic Data for Differentially Private Diffusion Training [4.815212947276105]
Programmatically generated synthetic data has been used in differential private training for classification to avoid privacy leakage. The model trained with synthetic data generates unrealistic random images, raising challenges to adapt synthetic data for generative models. We propose DPSynGen, which leverages generated synthetic data in diffusion models to address this challenge.
arXiv Detail & Related papers (2024-12-13T04:22:23Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models. In this paper, we investigate how detection performance varies across model backbones, types, and datasets. We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Privacy-Safe Iris Presentation Attack Detection [4.215251065887862]
This paper proposes a framework for a privacy-safe iris presentation attack detection (PAD) method. It is designed solely with synthetically-generated, identity-leakage-free iris images. The method is evaluated in a classical way using state-of-the-art iris PAD benchmarks.
arXiv Detail & Related papers (2024-08-05T18:09:02Z)
Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks [5.0243930429558885]
This paper introduces Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers. At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information. The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases.
arXiv Detail & Related papers (2024-07-22T10:31:07Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z)
Generation of Non-Deterministic Synthetic Face Datasets Guided by Identity Priors [19.095368725147367]
We propose a non-deterministic method for generating mated face images by exploiting the well-structured latent space of StyleGAN. We create a new dataset of synthetic face images (SymFace) consisting of 77,034 samples including 25,919 synthetic IDs.
arXiv Detail & Related papers (2021-12-07T11:08:47Z)
A Deep Learning Generative Model Approach for Image Synthesis of Plant Leaves [62.997667081978825]
We generate via advanced Deep Learning (DL) techniques artificial leaf images in an automatized way. We aim to dispose of a source of training samples for AI applications for modern crop management.
arXiv Detail & Related papers (2021-11-05T10:53:35Z)
Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training. We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively. Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Ensembles of GANs for synthetic training data generation [7.835101177261939]
Insufficient training data is a major bottleneck for most deep learning practices. This work investigates the use of synthetic images, created by generative adversarial networks (GANs), as the only source of training data.
arXiv Detail & Related papers (2021-04-23T19:38:48Z)
Identity-Aware CycleGAN for Face Photo-Sketch Synthesis and Recognition [61.87842307164351]
We first propose an Identity-Aware CycleGAN (IACycleGAN) model that applies a new perceptual loss to supervise the image generation network. It improves CycleGAN on photo-sketch synthesis by paying more attention to the synthesis of key facial regions, such as eyes and nose. We develop a mutual optimization procedure between the synthesis model and the recognition model, which iteratively synthesizes better images by IACycleGAN.
arXiv Detail & Related papers (2021-03-30T01:30:08Z)
Multi-Spectral Image Synthesis for Crop/Weed Segmentation in Precision Farming [3.4788711710826083]
We propose an alternative solution with respect to the common data augmentation methods, applying it to the problem of crop/weed segmentation in precision farming. We create semi-artificial samples by replacing the most relevant object classes (i.e., crop and weeds) with their synthesized counterparts. In addition to RGB data, we take into account also near-infrared (NIR) information, generating four channel multi-spectral synthetic images.
arXiv Detail & Related papers (2020-09-12T08:49:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.