Personalized Face Inpainting with Diffusion Models by Parallel Visual
Attention
- URL: http://arxiv.org/abs/2312.03556v1
- Date: Wed, 6 Dec 2023 15:39:03 GMT
- Title: Personalized Face Inpainting with Diffusion Models by Parallel Visual
Attention
- Authors: Jianjin Xu, Saman Motamed, Praneetha Vaddamanu, Chen Henry Wu,
Christian Haene, Jean-Charles Bazin, Fernando de la Torre
- Abstract summary: This paper proposes the use of Parallel Visual Attention (PVA) in conjunction with diffusion models to improve inpainting results.
We train the added attention modules and identity encoder on CelebAHQ-IDI, a dataset proposed for identity-preserving face inpainting.
Experiments demonstrate that PVA attains unparalleled identity resemblance in both face inpainting and face inpainting with language guidance tasks.
- Score: 55.33017432880408
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Face inpainting is important in various applications, such as photo
restoration, image editing, and virtual reality. Despite the significant
advances in face generative models, ensuring that a person's unique facial
identity is maintained during the inpainting process is still an elusive goal.
Current state-of-the-art techniques, exemplified by MyStyle, necessitate
resource-intensive fine-tuning and a substantial number of images for each new
identity. Furthermore, existing methods often fall short in accommodating
user-specified semantic attributes, such as beard or expression. To improve
inpainting results, and reduce the computational complexity during inference,
this paper proposes the use of Parallel Visual Attention (PVA) in conjunction
with diffusion models. Specifically, we insert parallel attention matrices to
each cross-attention module in the denoising network, which attends to features
extracted from reference images by an identity encoder. We train the added
attention modules and identity encoder on CelebAHQ-IDI, a dataset proposed for
identity-preserving face inpainting. Experiments demonstrate that PVA attains
unparalleled identity resemblance in both face inpainting and face inpainting
with language guidance tasks, in comparison to various benchmarks, including
MyStyle, Paint by Example, and Custom Diffusion. Our findings reveal that PVA
ensures good identity preservation while offering effective
language-controllability. Additionally, in contrast to Custom Diffusion, PVA
requires just 40 fine-tuning steps for each new identity, which translates to a
significant speed increase of over 20 times.
Related papers
- OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Disentangling Identity and Pose for Facial Expression Recognition [54.50747989860957]
We propose an identity and pose disentangled facial expression recognition (IPD-FER) model to learn more discriminative feature representation.
For identity encoder, a well pre-trained face recognition model is utilized and fixed during training, which alleviates the restriction on specific expression training data.
By comparing the difference between synthesized neutral and expressional images of the same individual, the expression component is further disentangled from identity and pose.
arXiv Detail & Related papers (2022-08-17T06:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.