Learning Complex 3D Human Self-Contact
- URL: http://arxiv.org/abs/2012.10366v1
- Date: Fri, 18 Dec 2020 17:09:34 GMT
- Title: Learning Complex 3D Human Self-Contact
- Authors: Mihai Fieraru, Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Vlad
Olaru, Cristian Sminchisescu
- Abstract summary: Existing 3d reconstruction methods do not focus on body regions in self-contact.
We develop a model for Self-Contact Prediction that estimates the body surface signature of self-contact.
We show how more expressive 3d reconstructions can be recovered under self-contact signature constraints.
- Score: 33.83748199524761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular estimation of three dimensional human self-contact is fundamental
for detailed scene analysis including body language understanding and behaviour
modeling. Existing 3d reconstruction methods do not focus on body regions in
self-contact and consequently recover configurations that are either far from
each other or self-intersecting, when they should just touch. This leads to
perceptually incorrect estimates and limits impact in those very fine-grained
analysis domains where detailed 3d models are expected to play an important
role. To address such challenges we detect self-contact and design 3d losses to
explicitly enforce it. Specifically, we develop a model for Self-Contact
Prediction (SCP), that estimates the body surface signature of self-contact,
leveraging the localization of self-contact in the image, during both training
and inference. We collect two large datasets to support learning and
evaluation: (1) HumanSC3D, an accurate 3d motion capture repository containing
$1,032$ sequences with $5,058$ contact events and $1,246,487$ ground truth 3d
poses synchronized with images collected from multiple views, and (2)
FlickrSC3D, a repository of $3,969$ images, containing $25,297$
surface-to-surface correspondences with annotated image spatial support. We
also illustrate how more expressive 3d reconstructions can be recovered under
self-contact signature constraints and present monocular detection of
face-touch as one of the multiple applications made possible by more accurate
self-contact models.
Related papers
- DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image [98.29284902879652]
We present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image.
It features disentangling the regression of local deformation fields and global mesh locations into two network branches.
It achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility.
arXiv Detail & Related papers (2024-06-26T00:08:29Z) - Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers [28.38686299271394]
We propose a framework for 3D sequence-to-sequence (seq2seq) human pose detection.
Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships.
Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset.
arXiv Detail & Related papers (2024-01-30T03:00:25Z) - Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models [8.933560282929726]
We introduce a novel affordance representation, named Comprehensive Affordance (ComA)
Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes.
We demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance.
arXiv Detail & Related papers (2024-01-23T18:59:59Z) - Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z) - DECO: Dense Estimation of 3D Human-Scene Contact In The Wild [54.44345845842109]
We train a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate contact on the SMPL body.
We significantly outperform existing SOTA methods across all benchmarks.
We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images.
arXiv Detail & Related papers (2023-09-26T21:21:07Z) - Reconstructing Three-Dimensional Models of Interacting Humans [38.26269716290761]
CHI3D is a lab-based accurate 3d motion capture dataset with 631 sequences containing $2,525$ contact events.
FlickrCI3D is a dataset of $11,216$ images, with $14,081$ processed pairs of people, and $81,233$ facet-level surface correspondences.
arXiv Detail & Related papers (2023-08-03T16:20:33Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - On Self-Contact and Human Pose [50.96752167102025]
We develop new datasets and methods that significantly improve human pose estimation with self-contact.
We show that the new self-contact training data significantly improves 3D human pose estimates on withheld test data and existing datasets like 3DPW.
arXiv Detail & Related papers (2021-04-07T15:10:38Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.