CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using
Score-Based Diffusion Models
- URL: http://arxiv.org/abs/2402.06106v1
- Date: Thu, 8 Feb 2024 23:51:49 GMT
- Title: CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using
Score-Based Diffusion Models
- Authors: Maitreya Suin, Rama Chellappa
- Abstract summary: Recent generative-prior-based methods have shown promising blind face restoration performance.
Generating fine-grained facial details faithful to inputs remains a challenging problem.
We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings.
- Score: 57.9771859175664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent generative-prior-based methods have shown promising blind face
restoration performance. They usually project the degraded images to the latent
space and then decode high-quality faces either by single-stage latent
optimization or directly from the encoding. Generating fine-grained facial
details faithful to inputs remains a challenging problem. Most existing methods
produce either overly smooth outputs or alter the identity as they attempt to
balance between generation and reconstruction. This may be attributed to the
typical trade-off between quality and resolution in the latent space. If the
latent space is highly compressed, the decoded output is more robust to
degradations but shows worse fidelity. On the other hand, a more flexible
latent space can capture intricate facial details better, but is extremely
difficult to optimize for highly degraded faces using existing techniques. To
address these issues, we introduce a diffusion-based-prior inside a VQGAN
architecture that focuses on learning the distribution over uncorrupted latent
embeddings. With such knowledge, we iteratively recover the clean embedding
conditioning on the degraded counterpart. Furthermore, to ensure the reverse
diffusion trajectory does not deviate from the underlying identity, we train a
separate Identity Recovery Network and use its output to constrain the reverse
diffusion process. Specifically, using a learnable latent mask, we add
gradients from a face-recognition network to a subset of latent features that
correlates with the finer identity-related details in the pixel space, leaving
the other features untouched. Disentanglement between perception and fidelity
in the latent space allows us to achieve the best of both worlds. We perform
extensive evaluations on multiple real and synthetic datasets to validate the
superiority of our approach.
Related papers
- Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration [62.44659039265439]
We propose a Diffusion-Information-Diffusion framework to tackle blind face restoration.
DiffMAC achieves high-generalization face restoration in diverse degraded scenes and heterogeneous domains.
Results demonstrate the superiority of DiffMAC over state-of-the-art methods.
arXiv Detail & Related papers (2024-03-15T08:44:15Z) - SARGAN: Spatial Attention-based Residuals for Facial Expression
Manipulation [1.7056768055368383]
We present a novel method named SARGAN that addresses the limitations from three perspectives.
We exploited a symmetric encoder-decoder network to attend facial features at multiple scales.
Our proposed model performs significantly better than state-of-the-art methods.
arXiv Detail & Related papers (2023-03-30T08:15:18Z) - DifFace: Blind Face Restoration with Diffused Error Contraction [62.476329680424975]
DifFace is capable of coping with unseen and complex degradations more gracefully without complicated loss designs.
It is superior to current state-of-the-art methods, especially in cases with severe degradations.
arXiv Detail & Related papers (2022-12-13T11:52:33Z) - High-resolution Face Swapping via Latent Semantics Disentanglement [50.23624681222619]
We present a novel high-resolution hallucination face swapping method using the inherent prior knowledge of a pre-trained GAN model.
We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator.
We extend our method to video face swapping by enforcing two-temporal constraints on the latent space and the image space.
arXiv Detail & Related papers (2022-03-30T00:33:08Z) - SuperFront: From Low-resolution to High-resolution Frontal Face
Synthesis [65.35922024067551]
We propose a generative adversarial network (GAN) -based model to generate high-quality, identity preserving frontal faces.
Specifically, we propose SuperFront-GAN to synthesize a high-resolution (HR), frontal face from one-to-many LR faces with various poses.
We integrate a super-resolution side-view module into SF-GAN to preserve identity information and fine details of the side-views in HR space.
arXiv Detail & Related papers (2020-12-07T23:30:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.