Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models
- URL: http://arxiv.org/abs/2409.07269v1
- Date: Wed, 11 Sep 2024 13:43:53 GMT
- Title: Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models
- Authors: Sanoojan Baliah, Qinliang Lin, Shengcai Liao, Xiaodan Liang, Muhammad Haris Khan,
- Abstract summary: We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
- Score: 69.50286698375386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite promising progress in face swapping task, realistic swapped images remain elusive, often marred by artifacts, particularly in scenarios involving high pose variation, color differences, and occlusion. To address these issues, we propose a novel approach that better harnesses diffusion models for face-swapping by making following core contributions. (a) We propose to re-frame the face-swapping task as a self-supervised, train-time inpainting problem, enhancing the identity transfer while blending with the target image. (b) We introduce a multi-step Denoising Diffusion Implicit Model (DDIM) sampling during training, reinforcing identity and perceptual similarities. (c) Third, we introduce CLIP feature disentanglement to extract pose, expression, and lighting information from the target image, improving fidelity. (d) Further, we introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping, with an additional feature of head swapping. Ours can swap hair and even accessories, beyond traditional face swapping. Unlike prior works reliant on multiple off-the-shelf models, ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models. Extensive experiments on FFHQ and CelebA datasets validate the efficacy and robustness of our approach, showcasing high-fidelity, realistic face-swapping with minimal inference time. Our code is available at https://github.com/Sanoojan/REFace.
Related papers
- Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [29.217423805933727]
Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks.
We propose an Texture-Preserving Diffusion (TPD) model for virtual try-on, which enhances the fidelity of the results.
Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images.
arXiv Detail & Related papers (2024-04-01T12:43:22Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using
Score-Based Diffusion Models [57.9771859175664]
Recent generative-prior-based methods have shown promising blind face restoration performance.
Generating fine-grained facial details faithful to inputs remains a challenging problem.
We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings.
arXiv Detail & Related papers (2024-02-08T23:51:49Z) - High-Fidelity Face Swapping with Style Blending [16.024260677867076]
We propose an innovative end-to-end framework for high-fidelity face swapping.
First, we introduce a StyleGAN-based facial attributes encoder that extracts essential features from faces and inverts them into a latent style code.
Second, we introduce an attention-based style blending module to effectively transfer Face IDs from source to target.
arXiv Detail & Related papers (2023-12-17T23:22:37Z) - Personalized Face Inpainting with Diffusion Models by Parallel Visual
Attention [55.33017432880408]
This paper proposes the use of Parallel Visual Attention (PVA) in conjunction with diffusion models to improve inpainting results.
We train the added attention modules and identity encoder on CelebAHQ-IDI, a dataset proposed for identity-preserving face inpainting.
Experiments demonstrate that PVA attains unparalleled identity resemblance in both face inpainting and face inpainting with language guidance tasks.
arXiv Detail & Related papers (2023-12-06T15:39:03Z) - DiffFace: Diffusion-based Face Swapping with Facial Guidance [24.50570533781642]
We propose a diffusion-based face swapping framework for the first time, called DiffFace.
It is composed of training ID conditional DDPM, sampling with facial guidance, and a target-preserving blending.
DiffFace achieves better benefits such as training stability, high fidelity, diversity of the samples, and controllability.
arXiv Detail & Related papers (2022-12-27T02:51:46Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment [28.83743270895698]
We present Face Swapping GAN (FSGAN) for face swapping and reenactment.
Unlike previous work, we offer a subject swapping scheme that can be applied to pairs of faces without requiring training on those faces.
We derive a novel iterative deep learning--based approach for face reenactment which adjusts significant pose and expression variations that can be applied to a single image or a video sequence.
For video sequences, we introduce a continuous agnostic of the face views based on reenactment, Delaunay Triangulation, and bary coordinates. Occluded face regions are handled by a face completion network.
arXiv Detail & Related papers (2022-02-25T21:04:39Z) - Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo
Collection [65.92058628082322]
Non-parametric face modeling aims to reconstruct 3D face only from images without shape assumptions.
This paper presents a novel Learning to Aggregate and Personalize framework for unsupervised robust 3D face modeling.
arXiv Detail & Related papers (2021-06-15T03:10:17Z) - FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping [43.236261887752065]
We propose a novel two-stage framework, called FaceShifter, for high fidelity and occlusion aware face swapping.
In its first stage, our framework generates the swapped face in high-fidelity by exploiting and integrating the target attributes thoroughly and adaptively.
To address the challenging facial synthesiss, we append a second stage consisting of a novel Heuristic Error Acknowledging Refinement Network (HEAR-Net)
arXiv Detail & Related papers (2019-12-31T17:57:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.