Learning Position-Aware Implicit Neural Network for Real-World Face
Inpainting
- URL: http://arxiv.org/abs/2401.10537v1
- Date: Fri, 19 Jan 2024 07:31:44 GMT
- Title: Learning Position-Aware Implicit Neural Network for Real-World Face
Inpainting
- Authors: Bo Zhao, Huan Yang and Jianlong Fu
- Abstract summary: Face inpainting requires the model to have a precise global understanding of the facial position structure.
In this paper, we propose an textbfImplicit textbfNeural textbfInpainting textbfNetwork (IN$2$) to handle arbitrary-shape face images in real-world scenarios.
- Score: 55.87303287274932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face inpainting requires the model to have a precise global understanding of
the facial position structure. Benefiting from the powerful capabilities of
deep learning backbones, recent works in face inpainting have achieved decent
performance in ideal setting (square shape with $512px$). However, existing
methods often produce a visually unpleasant result, especially in the
position-sensitive details (e.g., eyes and nose), when directly applied to
arbitrary-shaped images in real-world scenarios. The visually unpleasant
position-sensitive details indicate the shortcomings of existing methods in
terms of position information processing capability. In this paper, we propose
an \textbf{I}mplicit \textbf{N}eural \textbf{I}npainting \textbf{N}etwork
(IN$^2$) to handle arbitrary-shape face images in real-world scenarios by
explicit modeling for position information. Specifically, a downsample
processing encoder is proposed to reduce information loss while obtaining the
global semantic feature. A neighbor hybrid attention block is proposed with a
hybrid attention mechanism to improve the facial understanding ability of the
model without restricting the shape of the input. Finally, an implicit neural
pyramid decoder is introduced to explicitly model position information and
bridge the gap between low-resolution features and high-resolution output.
Extensive experiments demonstrate the superiority of the proposed method in
real-world face inpainting task.
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Outpainting by Queries [23.626012684754965]
We propose a novel hybrid vision-transformer-based encoder-decoder framework, named textbfQuery textbfOutpainting textbfTRansformer (textbfQueryOTR)
We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches.
arXiv Detail & Related papers (2022-07-12T04:48:41Z) - Compressible-composable NeRF via Rank-residual Decomposition [21.92736190195887]
Neural Radiance Field (NeRF) has emerged as a compelling method to represent 3D objects and scenes for photo-realistic rendering.
We present a neural representation that enables efficient and convenient manipulation of models.
Our method is able to achieve comparable rendering quality to state-of-the-art methods, while enabling extra capability of compression and composition.
arXiv Detail & Related papers (2022-05-30T06:18:59Z) - Unconstrained Face Sketch Synthesis via Perception-Adaptive Network and
A New Benchmark [16.126100433405398]
We argue that accurately perceiving facial region and facial components is crucial for unconstrained sketch synthesis.
We propose a novel Perception-Adaptive Network (PANet), which can generate high-quality face sketches under unconstrained conditions.
We introduce a new benchmark termed WildSketch, which contains 800 pairs of face photo-sketch with large variations in pose, expression, ethnic origin, background, and illumination.
arXiv Detail & Related papers (2021-12-02T07:08:31Z) - Face Sketch Synthesis via Semantic-Driven Generative Adversarial Network [10.226808267718523]
We propose a novel Semantic-Driven Generative Adrial Network (SDGAN) which embeds global structure-level style injection and local class-level knowledge re-weighting.
Specifically, we conduct facial saliency detection on the input face photos to provide overall facial texture structure.
In addition, we exploit face parsing layouts as the semantic-level spatial prior to enforce globally structural style injection in the generator of SDGAN.
arXiv Detail & Related papers (2021-06-29T07:03:56Z) - Inverting Generative Adversarial Renderer for Face Reconstruction [58.45125455811038]
In this work, we introduce a novel Generative Adversa Renderer (GAR)
GAR learns to model the complicated real-world image, instead of relying on the graphics rules, it is capable of producing realistic images.
Our method achieves state-of-the-art performances on multiple face reconstruction.
arXiv Detail & Related papers (2021-05-06T04:16:06Z) - Deep Generation of Face Images from Sketches [36.146494762987146]
Deep image-to-image translation techniques allow fast generation of face images from freehand sketches.
Existing solutions tend to overfit to sketches, thus requiring professional sketches or even edge maps as input.
We propose to implicitly model the shape space of plausible face images and synthesize a face image in this space to approximate an input sketch.
Our method essentially uses input sketches as soft constraints and is thus able to produce high-quality face images even from rough and/or incomplete sketches.
arXiv Detail & Related papers (2020-06-01T16:20:23Z) - Exploiting Semantics for Face Image Deblurring [121.44928934662063]
We propose an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks.
We incorporate face semantic labels as input priors and propose an adaptive structural loss to regularize facial local structures.
The proposed method restores sharp images with more accurate facial features and details.
arXiv Detail & Related papers (2020-01-19T13:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.