DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
- URL: http://arxiv.org/abs/2403.19235v1
- Date: Thu, 28 Mar 2024 08:47:02 GMT
- Title: DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
- Authors: Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong Liu, Jingdong Wang,
- Abstract summary: We introduce DreamSalon, a noise-guided, staged-editing framework.
It focuses on detailed image manipulations and identity-context preservation.
Experiments demonstrate DreamSalon's ability to efficiently and faithfully edit fine details on human faces.
- Score: 34.372331192321944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context. Existing personalization methods either require time-consuming optimization or learning additional encoders, adept in "identity re-contextualization". However, they often struggle with detailed and sensitive tasks like human face editing. To address these challenges, we introduce DreamSalon, a noise-guided, staged-editing framework, uniquely focusing on detailed image manipulations and identity-context preservation. By discerning editing and boosting stages via the frequency and gradient of predicted noises, DreamSalon first performs detailed manipulations on specific features in the editing stage, guided by high-frequency information, and then employs stochastic denoising in the boosting stage to improve image quality. For more precise editing, DreamSalon semantically mixes source and target textual prompts, guided by differences in their embedding covariances, to direct the model's focus on specific manipulation areas. Our experiments demonstrate DreamSalon's ability to efficiently and faithfully edit fine details on human faces, outperforming existing methods both qualitatively and quantitatively.
Related papers
- IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion [12.494492016414503]
Existing models encounter challenges such as poor editing quality, high computational costs and difficulties in preserving facial identity across diverse edits.
We propose a novel facial video editing framework that leverages the rich latent space of pre-trained text-to-image (T2I) diffusion models.
Our approach significantly reduces editing time by 80%, while maintaining temporal consistency throughout the video sequence.
arXiv Detail & Related papers (2025-01-13T18:08:27Z) - PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium [55.72249032433108]
PersonaMagic is a stage-regulated generative technique designed for high-fidelity face customization.
Our method learns a series of embeddings within a specific timestep interval to capture face concepts.
Tests confirm the superiority of PersonaMagic over state-of-the-art methods in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2024-12-20T08:41:25Z) - Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency [33.35678923549471]
FreeCure is a training-free framework that harnesses the intrinsic knowledge from the foundation models themselves to improve the prompt consistency of personalization models.
We enhance multiple attributes in the outputs of personalization models through a novel noise-blending strategy and an inversion-based process.
arXiv Detail & Related papers (2024-11-22T15:21:38Z) - LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing [20.861672583434718]
LIPE is a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject.
We present LIPE, a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject, and subsequently employ the model with learned prior for non-rigid image editing.
arXiv Detail & Related papers (2024-06-25T02:56:16Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing
with Pre-Trained Diffusion Model [22.975965453227477]
We introduce a new framework called textitPaste, Inpaint and Harmonize via Denoising (PhD)
In our experiments, we apply PhD to both subject-driven image editing tasks and explore text-driven scene generation given a reference subject.
arXiv Detail & Related papers (2023-06-13T07:43:10Z) - Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects.
By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.