From Easy to Hard++: Promoting Differentially Private Image Synthesis Through Spatial-Frequency Curriculum
- URL: http://arxiv.org/abs/2601.06368v1
- Date: Sat, 10 Jan 2026 00:38:29 GMT
- Title: From Easy to Hard++: Promoting Differentially Private Image Synthesis Through Spatial-Frequency Curriculum
- Authors: Chen Gong, Kecen Li, Zinan Lin, Tianhao Wang,
- Abstract summary: We propose FETA-Pro, which introduces frequency features as training shortcuts'<n>FETA-Pro shows an average of 25.7% higher fidelity and 4.1% greater utility than the best-performing baseline.
- Score: 18.158096061348157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve the quality of Differentially private (DP) synthetic images, most studies have focused on improving the core optimization techniques (e.g., DP-SGD). Recently, we have witnessed a paradigm shift that takes these techniques off the shelf and studies how to use them together to achieve the best results. One notable work is DP-FETA, which proposes using `central images' for `warming up' the DP training and then using traditional DP-SGD. Inspired by DP-FETA, we are curious whether there are other such tools we can use together with DP-SGD. We first observe that using `central images' mainly works for datasets where there are many samples that look similar. To handle scenarios where images could vary significantly, we propose FETA-Pro, which introduces frequency features as `training shortcuts.' The complexity of frequency features lies between that of spatial features (captured by `central images') and full images, allowing for a finer-grained curriculum for DP training. To incorporate these two types of shortcuts together, one challenge is to handle the training discrepancy between spatial and frequency features. To address it, we leverage the pipeline generation property of generative models (instead of having one model trained with multiple features/objectives, we can have multiple models working on different features, then feed the generated results from one model into another) and use a more flexible design. Specifically, FETA-Pro introduces an auxiliary generator to produce images aligned with noisy frequency features. Then, another model is trained with these images, together with spatial features and DP-SGD. Evaluated across five sensitive image datasets, FETA-Pro shows an average of 25.7% higher fidelity and 4.1% greater utility than the best-performing baseline, under a privacy budget $ε= 1$.
Related papers
- From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis [14.93795597224185]
Differentially private (DP) image synthesis aims to generate synthetic images from a sensitive dataset.<n>We propose a two-stage DP image synthesis framework, where diffusion models learn to generate synthetic images from easy to hard.<n>We conduct experiments to present that on the average of four investigated image datasets, the fidelity and utility metrics of our synthetic images are 33.1% and 2.1% better than the state-of-the-art method.
arXiv Detail & Related papers (2025-04-02T06:30:55Z) - DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis [18.158096061348157]
Differentially private (DP) image synthesis aims to generate artificial images that retain the properties of sensitive images while protecting the privacy of individual images within the dataset.<n>Despite recent advancements, inconsistent--and sometimes flawed--evaluation protocols have been applied across studies.<n>This paper introduces DPImageBench for DP image synthesis, with thoughtful design across several dimensions.
arXiv Detail & Related papers (2025-03-18T19:37:35Z) - One-Shot Learning for Pose-Guided Person Image Synthesis in the Wild [15.379362338850767]
Current Pose-Guided Person Image Synthesis (PGPIS) methods depend heavily on large amounts of labeled triplet data to train the generator in a supervised manner.
OnePoseTrans generates high-quality pose transfer results, offering greater stability than state-of-the-art data-driven methods.
For each test case, OnePoseTrans customizes a model in around 48 seconds with an NVIDIA V100 GPU.
arXiv Detail & Related papers (2024-09-15T02:42:25Z) - Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - Differentially Private Representation Learning via Image Captioning [51.45515227171524]
We show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
We successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation.
arXiv Detail & Related papers (2024-03-04T21:52:25Z) - Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training [54.074020740827855]
We find that SSHPE can be boosted from two cores: advanced data augmentations and concise consistency training ways.<n>This simple and compact design is interpretable, and easily benefits from newly found augmentations.<n>We extensively validate the superiority and versatility of our approach on conventional human body images, overhead fisheye images, and human hand images.
arXiv Detail & Related papers (2024-02-18T12:27:59Z) - PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
Time-Decoupled Training and Reusable Coop-Diffusion [45.06392070934473]
"PanGu-Draw" is a novel latent diffusion model designed for resource-efficient text-to-image synthesis.
We introduce "Coop-Diffusion", an algorithm that enables the cooperative use of various pre-trained diffusion models.
Empirical validations of Pangu-Draw show its exceptional prowess in text-to-image and multi-control image generation.
arXiv Detail & Related papers (2023-12-27T09:21:45Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Collaborative Score Distillation for Consistent Visual Synthesis [70.29294250371312]
Collaborative Score Distillation (CSD) is based on the Stein Variational Gradient Descent (SVGD)
We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes.
Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.
arXiv Detail & Related papers (2023-07-04T17:31:50Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.