VQFR: Blind Face Restoration with Vector-Quantized Dictionary and
Parallel Decoder
- URL: http://arxiv.org/abs/2205.06803v1
- Date: Fri, 13 May 2022 17:54:40 GMT
- Title: VQFR: Blind Face Restoration with Vector-Quantized Dictionary and
Parallel Decoder
- Authors: Yuchao Gu, Xintao Wang, Liangbin Xie, Chao Dong, Gen Li, Ying Shan,
Ming-Ming Cheng
- Abstract summary: We propose a VQ-based face restoration method -- VQFR.
VQFR takes advantage of high-quality low-level feature banks extracted from high-quality faces.
To further fuse low-level features from inputs while not "contaminating" the realistic details generated from the VQ codebook, we proposed a parallel decoder.
- Score: 83.63843671885716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although generative facial prior and geometric prior have recently
demonstrated high-quality results for blind face restoration, producing
fine-grained facial details faithful to inputs remains a challenging problem.
Motivated by the classical dictionary-based methods and the recent vector
quantization (VQ) technique, we propose a VQ-based face restoration method --
VQFR. VQFR takes advantage of high-quality low-level feature banks extracted
from high-quality faces and can thus help recover realistic facial details.
However, the simple application of the VQ codebook cannot achieve good results
with faithful details and identity preservation. Therefore, we further
introduce two special network designs. 1). We first investigate the compression
patch size in the VQ codebook and find that the VQ codebook designed with a
proper compression patch size is crucial to balance the quality and fidelity.
2). To further fuse low-level features from inputs while not "contaminating"
the realistic details generated from the VQ codebook, we proposed a parallel
decoder consisting of a texture decoder and a main decoder. Those two decoders
then interact with a texture warping module with deformable convolution.
Equipped with the VQ codebook as a facial detail dictionary and the parallel
decoder design, the proposed VQFR can largely enhance the restored quality of
facial details while keeping the fidelity to previous methods. Codes will be
available at https://github.com/TencentARC/VQFR.
Related papers
- Prediction and Reference Quality Adaptation for Learned Video Compression [54.58691829087094]
We propose a confidence-based prediction quality adaptation (PQA) module to provide explicit discrimination for the spatial and channel-wise prediction quality difference.
We also propose a reference quality adaptation (RQA) module and an associated repeat-long training strategy to provide dynamic spatially variant filters for diverse reference qualities.
arXiv Detail & Related papers (2024-06-20T09:03:26Z) - CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using
Score-Based Diffusion Models [57.9771859175664]
Recent generative-prior-based methods have shown promising blind face restoration performance.
Generating fine-grained facial details faithful to inputs remains a challenging problem.
We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings.
arXiv Detail & Related papers (2024-02-08T23:51:49Z) - Soft Convex Quantization: Revisiting Vector Quantization with Convex
Optimization [40.1651740183975]
We propose Soft Convex Quantization (SCQ) as a direct substitute for Vector Quantization (VQ)
SCQ works like a differentiable convex optimization (DCO) layer.
We demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets.
arXiv Detail & Related papers (2023-10-04T17:45:14Z) - Finite Scalar Quantization: VQ-VAE Made Simple [26.351016719675766]
We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ)
By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ.
We employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation.
arXiv Detail & Related papers (2023-09-27T09:13:40Z) - Dual Associated Encoder for Face Restoration [68.49568459672076]
We propose a novel dual-branch framework named DAEFR to restore facial details from low-quality (LQ) images.
Our method introduces an auxiliary LQ branch that extracts crucial information from the LQ inputs.
We evaluate the effectiveness of DAEFR on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-08-14T17:58:33Z) - Perceptual Quality Assessment of Face Video Compression: A Benchmark and
An Effective Method [69.868145936998]
Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs.
The great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA)
We introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos.
arXiv Detail & Related papers (2023-04-14T11:26:09Z) - Rethinking the Objectives of Vector-Quantized Tokenizers for Image
Synthesis [30.654501418221475]
We show that improving the reconstruction fidelity of VQ tokenizers does not necessarily improve the generation ability of generative transformers.
We propose Semantic-Quantized GAN (SeQ-GAN) with two learning phases to balance the two objectives.
Our SeQ-GAN (364M) achieves Frechet Inception Distance (FID) of 6.25 and Inception Score (IS) of 140.9 on 256x256 ImageNet generation.
arXiv Detail & Related papers (2022-12-06T17:58:38Z) - Towards Robust Blind Face Restoration with Codebook Lookup Transformer [94.48731935629066]
Blind face restoration is a highly ill-posed problem that often requires auxiliary guidance.
We show that a learned discrete codebook prior in a small proxy space cast blind face restoration as a code prediction task.
We propose a Transformer-based prediction network, named CodeFormer, to model global composition and context of the low-quality faces.
arXiv Detail & Related papers (2022-06-22T17:58:01Z) - Progressive Semantic-Aware Style Transformation for Blind Face
Restoration [26.66332852514812]
We propose a new progressive semantic-aware style transformation framework, named PSFR-GAN, for face restoration.
The proposed PSFR-GAN makes full use of the semantic (parsing maps) and pixel (LQ images) space information from different scales of input pairs.
Experiment results show that our model trained with synthetic data can not only produce more realistic high-resolution results for synthetic LQ inputs but also better to generalize natural LQ face images.
arXiv Detail & Related papers (2020-09-18T09:27:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.