Generative Proxemics: A Prior for 3D Social Interaction from Images
- URL: http://arxiv.org/abs/2306.09337v2
- Date: Tue, 12 Dec 2023 20:35:03 GMT
- Title: Generative Proxemics: A Prior for 3D Social Interaction from Images
- Authors: Lea M\"uller, Vickie Ye, Georgios Pavlakos, Michael Black, Angjoo
Kanazawa
- Abstract summary: Social interaction is a fundamental aspect of human behavior and communication.
We present a novel approach that learns a prior over the 3D proxemics two people in close social interaction.
Our approach recovers accurate and plausible 3D social interactions from noisy initial estimates, outperforming state-of-the-art methods.
- Score: 32.547187575678464
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Social interaction is a fundamental aspect of human behavior and
communication. The way individuals position themselves in relation to others,
also known as proxemics, conveys social cues and affects the dynamics of social
interaction. Reconstructing such interaction from images presents challenges
because of mutual occlusion and the limited availability of large training
datasets. To address this, we present a novel approach that learns a prior over
the 3D proxemics two people in close social interaction and demonstrate its use
for single-view 3D reconstruction. We start by creating 3D training data of
interacting people using image datasets with contact annotations. We then model
the proxemics using a novel denoising diffusion model called BUDDI that learns
the joint distribution over the poses of two people in close social
interaction. Sampling from our generative proxemics model produces realistic 3D
human interactions, which we validate through a perceptual study. We use BUDDI
in reconstructing two people in close proximity from a single image without any
contact annotation via an optimization approach that uses the diffusion model
as a prior. Our approach recovers accurate and plausible 3D social interactions
from noisy initial estimates, outperforming state-of-the-art methods. Our code,
data, and model are availableat our project website at: muelea.github.io/buddi.
Related papers
- G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis [57.07638884476174]
G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
arXiv Detail & Related papers (2024-04-18T17:59:28Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models [8.933560282929726]
We introduce a novel affordance representation, named Comprehensive Affordance (ComA)
Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes.
We demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance.
arXiv Detail & Related papers (2024-01-23T18:59:59Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - InterDiff: Generating 3D Human-Object Interactions with Physics-Informed
Diffusion [29.25063155767897]
This paper addresses a novel task of anticipating 3D human-object interactions (HOIs)
Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions.
Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.
arXiv Detail & Related papers (2023-08-31T17:59:08Z) - CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from
Unbounded Synthesized Images [10.4286198282079]
We present a method for teaching machines to understand and model the underlying spatial common sense of diverse human-object interactions in 3D.
We show multiple 2D images captured from different viewpoints when humans interact with the same type of objects.
Despite its imperfection of the image quality over real images, we demonstrate that the synthesized images are sufficient to learn the 3D human-object spatial relations.
arXiv Detail & Related papers (2023-08-23T17:59:11Z) - AROS: Affordance Recognition with One-Shot Human Stances [0.0]
We present AROS, a one-shot learning approach that uses an explicit representation of interactions between human poses and 3D scenes.
Given a 3D mesh of a previously unseen scene, we can predict affordance locations that support the interactions and generate corresponding articulated 3D human bodies around them.
Results show that our one-shot approach outperforms data-intensive baselines by up to 80%.
arXiv Detail & Related papers (2022-10-21T04:29:21Z) - Estimating 3D Motion and Forces of Human-Object Interactions from
Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video.
Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z) - Perceiving Humans: from Monocular 3D Localization to Social Distancing [93.03056743850141]
We present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image.
We show that it is possible to rethink the concept of "social distancing" as a form of social interaction in contrast to a simple location-based rule.
arXiv Detail & Related papers (2020-09-01T10:12:30Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.