UPGPT: Universal Diffusion Model for Person Image Generation, Editing
and Pose Transfer
- URL: http://arxiv.org/abs/2304.08870v2
- Date: Wed, 26 Jul 2023 17:13:12 GMT
- Title: UPGPT: Universal Diffusion Model for Person Image Generation, Editing
and Pose Transfer
- Authors: Soon Yau Cheong, Armin Mustafa, Andrew Gilbert
- Abstract summary: Text-to-image models (T2I) have been used to generate high quality images of people.
However, due to the random nature of the generation process, the person has a different appearance.
We propose a multimodal diffusion model that accepts text, pose, and visual prompting.
- Score: 15.15576618501609
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Text-to-image models (T2I) such as StableDiffusion have been used to generate
high quality images of people. However, due to the random nature of the
generation process, the person has a different appearance e.g. pose, face, and
clothing, despite using the same text prompt. The appearance inconsistency
makes T2I unsuitable for pose transfer. We address this by proposing a
multimodal diffusion model that accepts text, pose, and visual prompting. Our
model is the first unified method to perform all person image tasks -
generation, pose transfer, and mask-less edit. We also pioneer using small
dimensional 3D body model parameters directly to demonstrate new capability -
simultaneous pose and camera view interpolation while maintaining the person's
appearance.
Related papers
- From Text to Pose to Image: Improving Diffusion Model Control and Quality [0.5183511047901651]
We introduce a text-to-pose (T2P) generative model alongside a new sampling algorithm, and a new pose adapter that incorporates more pose keypoints for higher pose fidelity.
Together, these two new state-of-the-art models enable, for the first time, a generative text-to-pose-to-image framework for higher pose control in diffusion models.
arXiv Detail & Related papers (2024-11-19T21:34:50Z) - PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation [38.958695275774616]
We introduce a new transformer-based model, trained in a retrieval fashion, which can take as input any combination of the aforementioned modalities.
We showcase the potential of such an embroidered pose representation for (1) SMPL regression from image with optional text cue; and (2) on the task of fine-grained instruction generation.
arXiv Detail & Related papers (2024-09-10T14:09:39Z) - VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image.
We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls.
We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z) - Synthesizing Moving People with 3D Control [88.68284137105654]
We present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence.
For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image.
Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses.
arXiv Detail & Related papers (2024-01-19T18:59:11Z) - MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion [22.62170098534097]
We propose MagicPose, a diffusion-based model for 2D human pose and facial expression.
By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses.
The proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion.
arXiv Detail & Related papers (2023-11-18T10:22:44Z) - AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image
Collections [78.81539337399391]
We present an animatable 3D-aware GAN that generates portrait images with controllable facial expression, head pose, and shoulder movements.
It is a generative model trained on unstructured 2D image collections without using 3D or video data.
A dual-camera rendering and adversarial learning scheme is proposed to improve the quality of the generated faces.
arXiv Detail & Related papers (2023-09-05T12:44:57Z) - Progressive and Aligned Pose Attention Transfer for Person Image
Generation [59.87492938953545]
This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose.
We use two types of blocks, namely Pose-Attentional Transfer Block (PATB) and Aligned Pose-Attentional Transfer Bloc (APATB)
We verify the efficacy of the model on the Market-1501 and DeepFashion datasets, using quantitative and qualitative measures.
arXiv Detail & Related papers (2021-03-22T07:24:57Z) - HumanGAN: A Generative Model of Humans Images [78.6284090004218]
We present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style.
Our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture.
arXiv Detail & Related papers (2021-03-11T19:00:38Z) - PISE: Person Image Synthesis and Editing with Decoupled GAN [64.70360318367943]
We propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing.
For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing.
To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization.
arXiv Detail & Related papers (2021-03-06T04:32:06Z) - PoNA: Pose-guided Non-local Attention for Human Pose Transfer [105.14398322129024]
We propose a new human pose transfer method using a generative adversarial network (GAN) with simplified cascaded blocks.
Our model generates sharper and more realistic images with rich details, while having fewer parameters and faster speed.
arXiv Detail & Related papers (2020-12-13T12:38:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.