DeepCap: Monocular Human Performance Capture Using Weak Supervision
- URL: http://arxiv.org/abs/2003.08325v1
- Date: Wed, 18 Mar 2020 16:39:56 GMT
- Title: DeepCap: Monocular Human Performance Capture Using Weak Supervision
- Authors: Marc Habermann, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll,
Christian Theobalt
- Abstract summary: We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
- Score: 106.50649929342576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human performance capture is a highly important computer vision problem with
many applications in movie production and virtual/augmented reality. Many
previous performance capture approaches either required expensive multi-view
setups or did not recover dense space-time coherent geometry with
frame-to-frame correspondences. We propose a novel deep learning approach for
monocular dense human performance capture. Our method is trained in a weakly
supervised manner based on multi-view supervision completely removing the need
for training data with 3D ground truth annotations. The network architecture is
based on two separate networks that disentangle the task into a pose estimation
and a non-rigid surface deformation step. Extensive qualitative and
quantitative evaluations show that our approach outperforms the state of the
art in terms of quality and robustness.
Related papers
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular
Videos in the Wild [10.849750765175754]
POTR-3D is a sequence-to-sequence 2D-to-3D lifting model for 3DMPPE.
It robustly generalizes to diverse unseen views, robustly recovers the poses against heavy occlusions, and reliably generates more natural and smoother outputs.
arXiv Detail & Related papers (2023-09-15T06:17:22Z) - Deformable Model-Driven Neural Rendering for High-Fidelity 3D
Reconstruction of Human Heads Under Low-View Settings [20.07788905506271]
Reconstructing 3D human heads in low-view settings presents technical challenges.
We propose geometry decomposition and adopt a two-stage, coarse-to-fine training strategy.
Our method outperforms existing neural rendering approaches in terms of reconstruction accuracy and novel view synthesis under low-view settings.
arXiv Detail & Related papers (2023-03-24T08:32:00Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - A Deeper Look into DeepCap [96.67706102518238]
We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
arXiv Detail & Related papers (2021-11-20T11:34:33Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - SPCNet:Spatial Preserve and Content-aware Network for Human Pose
Estimation [3.2540745519652434]
We propose a novel Spatial Preserve and Content-aware Network(SPCNet), which includes two effective modules: Dilated Hourglass Module(DHM) and Selective Information Module(SIM)
In particular, we exceed previous methods and achieve the state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2020-04-13T09:14:00Z) - 3D Human Pose Estimation using Spatio-Temporal Networks with Explicit
Occlusion Training [40.933783830017035]
Estimating 3D poses from a monocular task is still a challenging task, despite the significant progress that has been made in recent years.
We introduce a-temporal video network for robust 3D human pose estimation.
We apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multistride temporal convolutional net-works (TCNs) to estimate 3D joints or keypoints.
arXiv Detail & Related papers (2020-04-07T09:12:12Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.