Putting People in Their Place: Affordance-Aware Human Insertion into
Scenes
- URL: http://arxiv.org/abs/2304.14406v1
- Date: Thu, 27 Apr 2023 17:59:58 GMT
- Title: Putting People in Their Place: Affordance-Aware Human Insertion into
Scenes
- Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan
Lu, Alexei A. Efros, Krishna Kumar Singh
- Abstract summary: We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes.
Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances.
Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition.
- Score: 61.63825003487104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of inferring scene affordances by presenting a method
for realistically inserting people into scenes. Given a scene image with a
marked region and an image of a person, we insert the person into the scene
while respecting the scene affordances. Our model can infer the set of
realistic poses given the scene context, re-pose the reference person, and
harmonize the composition. We set up the task in a self-supervised fashion by
learning to re-pose humans in video clips. We train a large-scale diffusion
model on a dataset of 2.4M video clips that produces diverse plausible poses
while respecting the scene context. Given the learned human-scene composition,
our model can also hallucinate realistic people and scenes when prompted
without conditioning and also enables interactive editing. A quantitative
evaluation shows that our method synthesizes more realistic human appearance
and more natural human-scene interactions than prior work.
Related papers
- Text2Place: Affordance-aware Text Guided Human Placement [26.041917073228483]
This work tackles the problem of realistic human insertion in a given background scene termed as textbfSemantic Human Placement.
For learning semantic masks, we leverage rich object-scene priors learned from the text-to-image generative models.
The proposed method can generate highly realistic scene compositions while preserving the background and subject identity.
arXiv Detail & Related papers (2024-07-22T08:00:06Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - PixelHuman: Animatable Neural Radiance Fields from Few Images [27.932366091437103]
We propose PixelHuman, a novel rendering model that generates animatable human scenes from a few images of a person.
Our method differs from existing methods in that it can generalize to any input image for animatable human synthesis.
Our experiments show that our method achieves state-of-the-art performance in multiview and novel pose synthesis from few-shot images.
arXiv Detail & Related papers (2023-07-18T08:41:17Z) - Scene Synthesis from Human Motion [26.2618553074691]
We propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion.
Our framework, Scene Synthesis from HUMan MotiON (MONSUM), includes two steps.
It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion.
arXiv Detail & Related papers (2023-01-04T03:30:46Z) - NeuMan: Neural Human Radiance Field from a Single Video [26.7471970027198]
We train two NeRF models: a human NeRF model and a scene NeRF model.
Our method is able to learn subject specific details, including cloth wrinkles and accessories, from just a 10 seconds video clip.
arXiv Detail & Related papers (2022-03-23T17:35:50Z) - Hallucinating Pose-Compatible Scenes [55.064949607528405]
We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
arXiv Detail & Related papers (2021-12-13T18:59:26Z) - Comparing Visual Reasoning in Humans and AI [66.89451296340809]
We created a dataset of complex scenes that contained human behaviors and social interactions.
We used a quantitative metric of similarity between scene descriptions of the AI/human and ground truth of five other human descriptions of each scene.
Results show that the machine/human agreement scene descriptions are much lower than human/human agreement for our complex scenes.
arXiv Detail & Related papers (2021-04-29T04:44:13Z) - Pose-Guided Human Animation from a Single Image in the Wild [83.86903892201656]
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses.
Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene.
We design a compositional neural network that predicts the silhouette, garment labels, and textures.
We are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene.
arXiv Detail & Related papers (2020-12-07T15:38:29Z) - Long-term Human Motion Prediction with Scene Context [60.096118270451974]
We propose a novel three-stage framework for predicting human motion.
Our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path.
arXiv Detail & Related papers (2020-07-07T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.