Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views
- URL: http://arxiv.org/abs/2304.06024v2
- Date: Sat, 16 Sep 2023 13:30:22 GMT
- Title: Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views
- Authors: Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker,
Siyu Tang
- Abstract summary: We propose a scene-conditioned diffusion method to model the body pose distribution.
The method generates bodies in plausible human-scene interactions.
It achieves superior accuracy for visible joints and diversity for invisible body parts.
- Score: 32.940614931864154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic perception of human behaviors during social interactions is crucial
for AR/VR applications, and an essential component is estimation of plausible
3D human pose and shape of our social partners from the egocentric view. One of
the biggest challenges of this task is severe body truncation due to close
social distances in egocentric scenarios, which brings large pose ambiguities
for unseen body parts. To tackle this challenge, we propose a novel
scene-conditioned diffusion method to model the body pose distribution.
Conditioned on the 3D scene geometry, the diffusion model generates bodies in
plausible human-scene interactions, with the sampling guided by a physics-based
collision score to further resolve human-scene inter-penetrations. The
classifier-free training enables flexible sampling with different conditions
and enhanced diversity. A visibility-aware graph convolution model guided by
per-joint visibility serves as the diffusion denoiser to incorporate
inter-joint dependencies and per-body-part control. Extensive evaluations show
that our method generates bodies in plausible interactions with 3D scenes,
achieving both superior accuracy for visible joints and diversity for invisible
body parts. The code is available at
https://sanweiliti.github.io/egohmr/egohmr.html.
Related papers
- Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models [16.259040755335885]
Previous auto-regression-based 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans.
We introduce two spatial collision guidance mechanisms: human-object collision avoidance and object-room boundary constraints.
Our framework can generate more natural and plausible 3D scenes with precise human-scene interactions.
arXiv Detail & Related papers (2024-06-26T08:18:39Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models [8.933560282929726]
We introduce a novel affordance representation, named Comprehensive Affordance (ComA)
Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes.
We demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance.
arXiv Detail & Related papers (2024-01-23T18:59:59Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Learning Visibility for Robust Dense Human Body Estimation [78.37389398573882]
Estimating 3D human pose and shape from 2D images is a crucial yet challenging task.
We learn dense human body estimation that is robust to partial observations.
We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates.
arXiv Detail & Related papers (2022-08-23T00:01:05Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it.
Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.