SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models
- URL: http://arxiv.org/abs/2506.00562v1
- Date: Sat, 31 May 2025 13:39:45 GMT
- Title: SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models
- Authors: Yule Zhu, Ping Liu, Zhedong Zheng, Wei Liu,
- Abstract summary: A growing class of applications now demands the ability to analyze and track sequences of progressive edits.<n>We introduce SEED, a large-scale Sequentially Edited facE dataset constructed via state-of-the-art diffusion models.<n>SEED contains over 90,000 facial images with one to four sequential attribute modifications, generated using diverse diffusion-based editing pipelines.
- Score: 23.54274625549125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently enabled precise and photorealistic facial editing across a wide range of semantic attributes. Beyond single-step modifications, a growing class of applications now demands the ability to analyze and track sequences of progressive edits, such as stepwise changes to hair, makeup, or accessories. However, sequential editing introduces significant challenges in edit attribution and detection robustness, further complicated by the lack of large-scale, finely annotated benchmarks tailored explicitly for this task. We introduce SEED, a large-scale Sequentially Edited facE Dataset constructed via state-of-the-art diffusion models. SEED contains over 90,000 facial images with one to four sequential attribute modifications, generated using diverse diffusion-based editing pipelines (LEdits, SDXL, SD3). Each image is annotated with detailed edit sequences, attribute masks, and prompts, facilitating research on sequential edit tracking, visual provenance analysis, and manipulation robustness assessment. To benchmark this task, we propose FAITH, a frequency-aware transformer-based model that incorporates high-frequency cues to enhance sensitivity to subtle sequential changes. Comprehensive experiments, including systematic comparisons of multiple frequency-domain methods, demonstrate the effectiveness of FAITH and the unique challenges posed by SEED. SEED offers a challenging and flexible resource for studying progressive diffusion-based edits at scale. Dataset and code will be publicly released at: https://github.com/Zeus1037/SEED.
Related papers
- Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing [26.02149948089938]
Instruction Influence Disentanglement (IID) is a novel framework enabling parallel execution of multiple instructions in a single denoising process.<n>We analyze self-attention mechanisms in DiTs and derive instruction-specific attention masks to disentangle each instruction's influence.<n>IID reduces diffusion steps while improving fidelity and instruction completion compared to existing baselines.
arXiv Detail & Related papers (2025-04-07T07:26:25Z) - Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation [70.95380821618711]
Dichotomous Image (DIS) tasks require highly precise annotations.<n>Current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability.<n>We introduce a novel approach, which provides a scalable solution for generating diverse and precise datasets.
arXiv Detail & Related papers (2024-12-26T06:37:25Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.<n>Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)<n>We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.<n>Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models [79.0135981840682]
We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to enable precise inversion for discrete diffusion models.<n>By recording noise sequences and masking patterns during the reverse diffusion process, DICE enables accurate reconstruction and flexible editing of discrete data.<n>Our results show that DICE preserves high data fidelity while enhancing editing capabilities, offering new opportunities for fine-grained content manipulation in discrete spaces.
arXiv Detail & Related papers (2024-10-10T17:59:48Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - Collaborative Score Distillation for Consistent Visual Synthesis [70.29294250371312]
Collaborative Score Distillation (CSD) is based on the Stein Variational Gradient Descent (SVGD)
We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes.
Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.
arXiv Detail & Related papers (2023-07-04T17:31:50Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.