Disentangled Representation Learning for Controllable Person Image
Generation
- URL: http://arxiv.org/abs/2312.05798v1
- Date: Sun, 10 Dec 2023 07:15:58 GMT
- Title: Disentangled Representation Learning for Controllable Person Image
Generation
- Authors: Wenju Xu, Chengjiang Long, Yongwei Nie, Guanghui Wang
- Abstract summary: We propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation.
To our knowledge, we are the first to learn disentangled latent representations with transformers for person image generation.
- Score: 29.719070087384512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel framework named DRL-CPG to learn
disentangled latent representation for controllable person image generation,
which can produce realistic person images with desired poses and human
attributes (e.g., pose, head, upper clothes, and pants) provided by various
source persons. Unlike the existing works leveraging the semantic masks to
obtain the representation of each component, we propose to generate
disentangled latent code via a novel attribute encoder with transformers
trained in a manner of curriculum learning from a relatively easy step to a
gradually hard one. A random component mask-agnostic strategy is introduced to
randomly remove component masks from the person segmentation masks, which aims
at increasing the difficulty of training and promoting the transformer encoder
to recognize the underlying boundaries between each component. This enables the
model to transfer both the shape and texture of the components. Furthermore, we
propose a novel attribute decoder network to integrate multi-level attributes
(e.g., the structure feature and the attribute representation) with
well-designed Dual Adaptive Denormalization (DAD) residual blocks. Extensive
experiments strongly demonstrate that the proposed approach is able to transfer
both the texture and shape of different human parts and yield realistic
results. To our knowledge, we are the first to learn disentangled latent
representations with transformers for person image generation.
Related papers
- Masked Face Recognition with Generative-to-Discriminative Representations [29.035270415311427]
We propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition.
First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors.
We incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors.
arXiv Detail & Related papers (2024-05-27T02:20:55Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - The Devil is in the Frequency: Geminated Gestalt Autoencoder for
Self-Supervised Visual Pre-Training [13.087987450384036]
We present a new Masked Image Modeling (MIM), termed Geminated Autoencoder (Ge$2$-AE) for visual pre-training.
Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space.
arXiv Detail & Related papers (2022-04-18T09:22:55Z) - IA-FaceS: A Bidirectional Method for Semantic Face Editing [8.19063619210761]
This paper proposes a bidirectional method for disentangled face attribute manipulation as well as flexible, controllable component editing.
IA-FaceS is developed for the first time without any input visual guidance, such as segmentation masks or sketches.
Both quantitative and qualitative results indicate that the proposed method outperforms the other techniques in reconstruction, face attribute manipulation, and component transfer.
arXiv Detail & Related papers (2022-03-24T14:44:56Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z) - Controllable Person Image Synthesis with Attribute-Decomposed GAN [27.313729413684012]
This paper introduces the Attribute-Decomposed GAN, a novel generative model for controllable person image synthesis.
The core idea of the proposed model is to embed human attributes into the latent space as independent codes.
Experimental results demonstrate the proposed method's superiority over the state of the art in pose transfer.
arXiv Detail & Related papers (2020-03-27T07:47:06Z) - DotFAN: A Domain-transferred Face Augmentation Network for Pose and
Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN)
DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains.
Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z) - Fine-grained Image-to-Image Transformation towards Visual Recognition [102.51124181873101]
We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image.
We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image.
Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
arXiv Detail & Related papers (2020-01-12T05:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.