Adversarial Synthesis of Human Pose from Text
- URL: http://arxiv.org/abs/2005.00340v2
- Date: Fri, 16 Oct 2020 09:38:08 GMT
- Title: Adversarial Synthesis of Human Pose from Text
- Authors: Yifei Zhang, Rania Briq, Julian Tanke, Juergen Gall
- Abstract summary: This work focuses on synthesizing human poses from human-level text descriptions.
We propose a model that is based on a conditional generative adversarial network.
We show through qualitative and quantitative results that the model is capable of synthesizing plausible poses matching the given text.
- Score: 18.02001711736337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work focuses on synthesizing human poses from human-level text
descriptions. We propose a model that is based on a conditional generative
adversarial network. It is designed to generate 2D human poses conditioned on
human-written text descriptions. The model is trained and evaluated using the
COCO dataset, which consists of images capturing complex everyday scenes with
various human poses. We show through qualitative and quantitative results that
the model is capable of synthesizing plausible poses matching the given text,
indicating that it is possible to generate poses that are consistent with the
given semantic features, especially for actions with distinctive poses.
Related papers
- PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation [38.958695275774616]
We introduce a new transformer-based model, trained in a retrieval fashion, which can take as input any combination of the aforementioned modalities.
We showcase the potential of such an embroidered pose representation for (1) SMPL regression from image with optional text cue; and (2) on the task of fine-grained instruction generation.
arXiv Detail & Related papers (2024-09-10T14:09:39Z) - Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection [16.42439177494448]
We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection.
Our method constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses.
Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified dataset in training presents remarkably better accuracy.
arXiv Detail & Related papers (2024-05-24T21:08:27Z) - Generating Holistic 3D Human Motion from Speech [97.11392166257791]
We build a high-quality dataset of 3D holistic body meshes with synchronous speech.
We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately.
arXiv Detail & Related papers (2022-12-08T17:25:19Z) - PoseScript: Linking 3D Human Poses and Natural Language [38.85620213438554]
We introduce the PoseScript dataset, which pairs more than six thousand 3D human poses with rich human-annotated descriptions.
To increase the size of the dataset to a scale that is compatible with data-hungry learning algorithms, we have proposed an elaborate captioning process.
This process extracts low-level pose information, known as "posecodes", using a set of simple but generic rules on the 3D keypoints.
With automatic annotations, the amount of available data significantly scales up (100k), making it possible to effectively pretrain deep models for finetuning on human captions.
arXiv Detail & Related papers (2022-10-21T08:18:49Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Hallucinating Pose-Compatible Scenes [55.064949607528405]
We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
arXiv Detail & Related papers (2021-12-13T18:59:26Z) - Towards Better Adversarial Synthesis of Human Images from Text [19.743502366461982]
The model's performance is evaluated on the COCO dataset.
We show how using such a shape as input to image synthesis frameworks helps to constrain the network to synthesize humans with realistic human shapes.
arXiv Detail & Related papers (2021-07-05T08:47:51Z) - PISE: Person Image Synthesis and Editing with Decoupled GAN [64.70360318367943]
We propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing.
For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing.
To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization.
arXiv Detail & Related papers (2021-03-06T04:32:06Z) - Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses.
Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene.
We design a compositional neural network that predicts the silhouette, garment labels, and textures.
We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z) - Liquid Warping GAN with Attention: A Unified Framework for Human Image
Synthesis [58.05389586712485]
We tackle human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis.
In this paper, we propose a 3D body mesh recovery module to disentangle the pose and shape.
We also build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.
arXiv Detail & Related papers (2020-11-18T02:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.