Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization
- URL: http://arxiv.org/abs/2105.14739v2
- Date: Thu, 3 Jun 2021 01:24:00 GMT
- Title: Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization
- Authors: Jichao Zhang, Aliaksandr Siarohin, Hao Tang, Jingjing Chen, Enver
Sangineto, Wei Wang, Nicu Sebe
- Abstract summary: Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
- Score: 72.65828901909708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Controllable person image generation aims to produce realistic human images
with desirable attributes (e.g., the given pose, cloth textures or hair style).
However, the large spatial misalignment between the source and target images
makes the standard architectures for image-to-image translation not suitable
for this task. Most of the state-of-the-art architectures avoid the alignment
step during the generation, which causes many artifacts, especially for person
images with complex textures. To solve this problem, we introduce a novel
Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned
flow-field to warp modulation parameters. This allows us to align person
spatial-adaptive styles with pose features efficiently. Moreover, we propose a
novel self-training part replacement strategy to refine the pretrained model
for the texture-transfer task, significantly improving the quality of the
generated cloth and the preservation ability of irrelevant regions. Our
experimental results on the widely used DeepFashion dataset demonstrate a
significant improvement of the proposed method over the state-of-the-art
methods on both pose-transfer and texture-transfer tasks. The source code is
available at https://github.com/zhangqianhui/Sawn.
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z) - Semantic Image Translation for Repairing the Texture Defects of Building
Models [16.764719266178655]
We introduce a novel approach for synthesizing faccade texture images that authentically reflect the architectural style from a structured label map.
Our proposed method is also capable of synthesizing texture images with specific styles for faccades that lack pre-existing textures.
arXiv Detail & Related papers (2023-03-30T14:38:53Z) - Masked and Adaptive Transformer for Exemplar Based Image Translation [16.93344592811513]
Cross-domain semantic matching is challenging.
We propose a masked and adaptive transformer (MAT) for learning accurate cross-domain correspondence.
We devise a novel contrastive style learning method, for acquire quality-discriminative style representations.
arXiv Detail & Related papers (2023-03-30T03:21:14Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z) - Region-adaptive Texture Enhancement for Detailed Person Image Synthesis [86.69934638569815]
RATE-Net is a novel framework for synthesizing person images with sharp texture details.
The proposed framework leverages an additional texture enhancing module to extract appearance information from the source image.
Experiments conducted on DeepFashion benchmark dataset have demonstrated the superiority of our framework compared with existing networks.
arXiv Detail & Related papers (2020-05-26T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.