Decaf: Monocular Deformation Capture for Face and Hand Interactions
- URL: http://arxiv.org/abs/2309.16670v2
- Date: Fri, 13 Oct 2023 15:45:13 GMT
- Title: Decaf: Monocular Deformation Capture for Face and Hand Interactions
- Authors: Soshi Shimada, Vladislav Golyanik, Patrick P\'erez, Christian Theobalt
- Abstract summary: This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
- Score: 77.75726740605748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for 3D tracking from monocular RGB videos predominantly
consider articulated and rigid objects. Modelling dense non-rigid object
deformations in this setting remained largely unaddressed so far, although such
effects can improve the realism of the downstream applications such as AR/VR
and avatar communications. This is due to the severe ill-posedness of the
monocular view setting and the associated challenges. While it is possible to
naively track multiple non-rigid objects independently using 3D templates or
parametric 3D models, such an approach would suffer from multiple artefacts in
the resulting 3D estimates such as depth ambiguity, unnatural intra-object
collisions and missing or implausible deformations. Hence, this paper
introduces the first method that addresses the fundamental challenges depicted
above and that allows tracking human hands interacting with human faces in 3D
from single monocular RGB videos. We model hands as articulated objects
inducing non-rigid face deformations during an active interaction. Our method
relies on a new hand-face motion and interaction capture dataset with realistic
face deformations acquired with a markerless multi-view camera system. As a
pivotal step in its creation, we process the reconstructed raw 3D shapes with
position-based dynamics and an approach for non-uniform stiffness estimation of
the head tissues, which results in plausible annotations of the surface
deformations, hand-face contact regions and head-hand positions. At the core of
our neural approach are a variational auto-encoder supplying the hand-face
depth prior and modules that guide the 3D tracking by estimating the contacts
and the deformations. Our final 3D hand and face reconstructions are realistic
and more plausible compared to several baselines applicable in our setting,
both quantitatively and qualitatively.
https://vcai.mpi-inf.mpg.de/projects/Decaf
Related papers
- DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image [98.29284902879652]
We present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image.
It features disentangling the regression of local deformation fields and global mesh locations into two network branches.
It achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility.
arXiv Detail & Related papers (2024-06-26T00:08:29Z) - SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction [13.417086460511696]
We introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes.
We consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence.
This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe.
arXiv Detail & Related papers (2023-09-19T16:48:29Z) - Implicit Neural Head Synthesis via Controllable Local Deformation Fields [12.191729556779972]
We build on part-based implicit shape models that decompose a global deformation field into local ones.
Our novel formulation models multiple implicit deformation fields with local semantic rig-like control via 3DMM-based parameters.
Our formulation renders sharper locally controllable nonlinear deformations than previous implicit monocular approaches.
arXiv Detail & Related papers (2023-04-21T16:35:28Z) - MoDA: Modeling Deformable 3D Objects from Casual Videos [84.29654142118018]
We propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation without skin-collapsing artifacts.
In the endeavor to register 2D pixels across different frames, we establish a correspondence between canonical feature embeddings that encodes 3D points within the canonical space.
Our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-04-17T13:49:04Z) - Neural Capture of Animatable 3D Human from Monocular Video [38.974181971541846]
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
arXiv Detail & Related papers (2022-08-18T09:20:48Z) - MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes [133.3300573151597]
MoCapDeform is a new framework for monocular 3D human motion capture.
It is the first to explicitly model non-rigid deformations of a 3D scene.
It achieves superior accuracy than competing methods on several datasets.
arXiv Detail & Related papers (2022-08-17T17:59:54Z) - HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense
Contact Guidance [82.09463058198546]
Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation.
We propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry.
arXiv Detail & Related papers (2022-05-11T17:59:31Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.