Text2Human: Text-Driven Controllable Human Image Generation
- URL: http://arxiv.org/abs/2205.15996v1
- Date: Tue, 31 May 2022 17:57:06 GMT
- Title: Text2Human: Text-Driven Controllable Human Image Generation
- Authors: Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei
Liu
- Abstract summary: Existing generative models often fall short under the high diversity of clothing shapes and textures.
We present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation.
- Score: 98.34326708923284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating high-quality and diverse human images is an important yet
challenging task in vision and graphics. However, existing generative models
often fall short under the high diversity of clothing shapes and textures.
Furthermore, the generation process is even desired to be intuitively
controllable for layman users. In this work, we present a text-driven
controllable framework, Text2Human, for a high-quality and diverse human
generation. We synthesize full-body human images starting from a given human
pose with two dedicated steps. 1) With some texts describing the shapes of
clothes, the given human pose is first translated to a human parsing map. 2)
The final human image is then generated by providing the system with more
attributes about the textures of clothes. Specifically, to model the diversity
of clothing textures, we build a hierarchical texture-aware codebook that
stores multi-scale neural representations for each type of texture. The
codebook at the coarse level includes the structural representations of
textures, while the codebook at the fine level focuses on the details of
textures. To make use of the learned hierarchical codebook to synthesize
desired images, a diffusion-based transformer sampler with mixture of experts
is firstly employed to sample indices from the coarsest level of the codebook,
which then is used to predict the indices of the codebook at finer levels. The
predicted indices at different levels are translated to human images by the
decoder learned accompanied with hierarchical codebooks. The use of
mixture-of-experts allows for the generated image conditioned on the
fine-grained text input. The prediction for finer level indices refines the
quality of clothing textures. Extensive quantitative and qualitative
evaluations demonstrate that our proposed framework can generate more diverse
and realistic human images compared to state-of-the-art methods.
Related papers
- TexVocab: Texture Vocabulary-conditioned Human Avatars [42.170169762733835]
TexVocab is a novel avatar representation that constructs a texture vocabulary and associates body poses with texture maps for animation.
Our method is able to create animatable human avatars with detailed and dynamic appearances from RGB videos.
arXiv Detail & Related papers (2024-03-31T01:58:04Z) - CapHuman: Capture Your Moments in Parallel Universes [60.06408546134581]
We present a new framework named CapHuman.
CapHuman encodes identity features and then learns to align them into the latent space.
We introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.
arXiv Detail & Related papers (2024-02-01T14:41:59Z) - TextureDreamer: Image-guided Texture Synthesis through Geometry-aware
Diffusion [64.49276500129092]
TextureDreamer is an image-guided texture synthesis method.
It can transfer relightable textures from a small number of input images to target 3D shapes across arbitrary categories.
arXiv Detail & Related papers (2024-01-17T18:55:49Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns [25.209863457090506]
We propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable synthesis of personalized composite clothing on input human images.
Unlike prior arts constrained by specific input types, our method allows flexible specification of style (text or image) and texture (full garment, cropped sections, or texture patches) conditions.
arXiv Detail & Related papers (2023-12-07T18:53:18Z) - Learning to Generate Semantic Layouts for Higher Text-Image
Correspondence in Text-to-Image Synthesis [37.32270579534541]
We propose a novel approach for enhancing text-image correspondence by leveraging available semantic layouts.
Our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset.
arXiv Detail & Related papers (2023-08-16T05:59:33Z) - Text2Performer: Text-Driven Human Video Generation [97.3849869893433]
Text-driven content creation has evolved to be a transformative technique that revolutionizes creativity.
Here we study the task of text-driven human video generation, where a video sequence is synthesized from texts describing the appearance and motions of a target performer.
In this work, we present Text2Performer to generate vivid human videos with articulated motions from texts.
arXiv Detail & Related papers (2023-04-17T17:59:02Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.