Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
- URL: http://arxiv.org/abs/2512.05039v1
- Date: Thu, 04 Dec 2025 17:56:08 GMT
- Title: Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
- Authors: Abhigyan Bhattacharya, Hiranmoy Roy,
- Abstract summary: Facial Image inpainting aims to restore the missing or corrupted regions in face images while preserving identity, structural consistency and image quality.<n>Existing methods face problems with large irregular masks, often producing blurry textures on the edges of the masked region.<n>We propose a novel architecture, which address these above challenges through semantic-guided hierarchical synthesis.
- Score: 1.7761223012399532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial Image inpainting aim is to restore the missing or corrupted regions in face images while preserving identity, structural consistency and photorealistic image quality, a task specifically created for photo restoration. Though there are recent lot of advances in deep generative models, existing methods face problems with large irregular masks, often producing blurry textures on the edges of the masked region, semantic inconsistencies, or unconvincing facial structures due to direct pixel level synthesis approach and limited exploitation of facial priors. In this paper we propose a novel architecture, which address these above challenges through semantic-guided hierarchical synthesis. Our approach starts with a method that organizes and synthesizes information based on meaning, followed by refining the texture. This process gives clear insights into the facial structure before we move on to creating detailed images. In the first stage, we blend two techniques: one that focuses on local features with CNNs and global features with Vision Transformers. This helped us create clear and detailed semantic layouts. In the second stage, we use a Multi-Modal Texture Generator to refine these layouts by pulling in information from different scales, ensuring everything looks cohesive and consistent. The architecture naturally handles arbitrary mask configurations through dynamic attention without maskspecific training. Experiment on two datasets CelebA-HQ and FFHQ shows that our model outperforms other state-of-the-art methods, showing improvements in metrics like LPIPS, PSNR, and SSIM. It produces visually striking results with better semantic preservation, in challenging large-area inpainting situations.
Related papers
- BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - PRISM: Progressive Restoration for Scene Graph-based Image Manipulation [47.77003316561398]
PRISM is a novel multi-head image manipulation approach to improve the accuracy and quality of the manipulated regions in the scene.
Our results demonstrate the potential of our approach for enhancing the quality and precision of scene graph-based image manipulation.
arXiv Detail & Related papers (2023-11-03T21:30:34Z) - Semantic Image Translation for Repairing the Texture Defects of Building
Models [16.764719266178655]
We introduce a novel approach for synthesizing faccade texture images that authentically reflect the architectural style from a structured label map.
Our proposed method is also capable of synthesizing texture images with specific styles for faccades that lack pre-existing textures.
arXiv Detail & Related papers (2023-03-30T14:38:53Z) - Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid [102.24539566851809]
Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task.
Recent image inpainting models have made significant progress in generating vivid visual details, but they can still lead to texture blurring or structural distortions.
We propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors can greatly benefit the recovery of locally missing content in images.
arXiv Detail & Related papers (2021-12-08T04:33:33Z) - Self-supervised High-fidelity and Re-renderable 3D Facial Reconstruction
from a Single Image [19.0074836183624]
We propose a novel self-supervised learning framework for reconstructing high-quality 3D faces from single-view images in-the-wild.
Our framework substantially outperforms state-of-the-art approaches in both qualitative and quantitative comparisons.
arXiv Detail & Related papers (2021-11-16T08:10:24Z) - FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for
Blind Face Inpainting [77.78305705925376]
Blind face inpainting refers to the task of reconstructing visual contents without explicitly indicating the corrupted regions in a face image.
We propose a novel two-stage blind face inpainting method named Frequency-guided Transformer and Top-Down Refinement Network (FT-TDR) to tackle these challenges.
arXiv Detail & Related papers (2021-08-10T03:12:01Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Foreground-guided Facial Inpainting with Fidelity Preservation [7.5089719291325325]
We propose a foreground-guided facial inpainting framework that can extract and generate facial features using convolutional neural network layers.
Specifically, we propose a new loss function with semantic capability reasoning of facial expressions, natural and unnatural features (make-up)
Our proposed method achieved comparable quantitative results when compare to the state of the art but qualitatively, it demonstrated high-fidelity preservation of facial components.
arXiv Detail & Related papers (2021-05-07T15:50:58Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.