Hallucinating Pose-Compatible Scenes
- URL: http://arxiv.org/abs/2112.06909v1
- Date: Mon, 13 Dec 2021 18:59:26 GMT
- Title: Hallucinating Pose-Compatible Scenes
- Authors: Tim Brooks, Alexei A. Efros
- Abstract summary: We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
- Score: 55.064949607528405
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: What does human pose tell us about a scene? We propose a task to answer this
question: given human pose as input, hallucinate a compatible scene. Subtle
cues captured by human pose -- action semantics, environment affordances,
object interactions -- provide surprising insight into which scenes are
compatible. We present a large-scale generative adversarial network for
pose-conditioned scene generation. We significantly scale the size and
complexity of training data, curating a massive meta-dataset containing over 19
million frames of humans in everyday environments. We double the capacity of
our model with respect to StyleGAN2 to handle such complex data, and design a
pose conditioning mechanism that drives our model to learn the nuanced
relationship between pose and scene. We leverage our trained model for various
applications: hallucinating pose-compatible scene(s) with or without humans,
visualizing incompatible scenes and poses, placing a person from one generated
image into another scene, and animating pose. Our model produces diverse
samples and outperforms pose-conditioned StyleGAN2 and Pix2Pix baselines in
terms of accurate human placement (percent of correct keypoints) and image
quality (Frechet inception distance).
Related papers
- PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation [38.958695275774616]
We introduce a new transformer-based model, trained in a retrieval fashion, which can take as input any combination of the aforementioned modalities.
We showcase the potential of such an embroidered pose representation for (1) SMPL regression from image with optional text cue; and (2) on the task of fine-grained instruction generation.
arXiv Detail & Related papers (2024-09-10T14:09:39Z) - UniHuman: A Unified Model for Editing Human Images in the Wild [49.896715833075106]
We propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings.
To enhance the model's generation quality and generalization capacity, we leverage guidance from human visual encoders.
In user studies, UniHuman is preferred by the users in an average of 77% of cases.
arXiv Detail & Related papers (2023-12-22T05:00:30Z) - Putting People in Their Place: Affordance-Aware Human Insertion into
Scenes [61.63825003487104]
We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes.
Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances.
Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition.
arXiv Detail & Related papers (2023-04-27T17:59:58Z) - Embodied Scene-aware Human Pose Estimation [25.094152307452]
We propose embodied scene-aware human pose estimation.
Our method is one stage, causal, and recovers global 3D human poses in a simulated environment.
arXiv Detail & Related papers (2022-06-18T03:50:19Z) - HumanGAN: A Generative Model of Humans Images [78.6284090004218]
We present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style.
Our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture.
arXiv Detail & Related papers (2021-03-11T19:00:38Z) - PISE: Person Image Synthesis and Editing with Decoupled GAN [64.70360318367943]
We propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing.
For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing.
To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization.
arXiv Detail & Related papers (2021-03-06T04:32:06Z) - Holistic 3D Human and Scene Mesh Estimation from Single View Images [5.100152971410397]
We propose an end-to-end trainable model that perceives the 3D scene from a single RGB image.
We show that our model outperforms existing human body mesh methods and indoor scene reconstruction methods.
arXiv Detail & Related papers (2020-12-02T23:22:03Z) - Unsupervised 3D Human Pose Representation with Viewpoint and Pose
Disentanglement [63.853412753242615]
Learning a good 3D human pose representation is important for human pose related tasks.
We propose a novel Siamese denoising autoencoder to learn a 3D pose representation.
Our approach achieves state-of-the-art performance on two inherently different tasks.
arXiv Detail & Related papers (2020-07-14T14:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.