Self-Supervised Human Depth Estimation from Monocular Videos
- URL: http://arxiv.org/abs/2005.03358v1
- Date: Thu, 7 May 2020 09:45:11 GMT
- Title: Self-Supervised Human Depth Estimation from Monocular Videos
- Authors: Feitong Tan, Hao Zhu, Zhaopeng Cui, Siyu Zhu, Marc Pollefeys, Ping Tan
- Abstract summary: Previous methods on estimating detailed human depth often require supervised training with ground truth' depth data.
This paper presents a self-supervised method that can be trained on YouTube videos without known depth.
Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
- Score: 99.39414134919117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous methods on estimating detailed human depth often require supervised
training with `ground truth' depth data. This paper presents a self-supervised
method that can be trained on YouTube videos without known depth, which makes
training data collection simple and improves the generalization of the learned
network. The self-supervised learning is achieved by minimizing a
photo-consistency loss, which is evaluated between a video frame and its
neighboring frames warped according to the estimated depth and the 3D non-rigid
motion of the human body. To solve this non-rigid motion, we first estimate a
rough SMPL model at each video frame and compute the non-rigid body motion
accordingly, which enables self-supervised learning on estimating the shape
details. Experiments demonstrate that our method enjoys better generalization
and performs much better on data in the wild.
Related papers
- MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Motion Degeneracy in Self-supervised Learning of Elevation Angle
Estimation for 2D Forward-Looking Sonar [4.683630397028384]
This study aims to realize stable self-supervised learning of elevation angle estimation without pretraining using synthetic images.
We first analyze the motion field of 2D forward-looking sonar, which is related to the main supervision signal.
arXiv Detail & Related papers (2023-07-30T08:06:11Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry [2.7541825072548805]
We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system.
We propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth.
arXiv Detail & Related papers (2021-08-17T17:31:24Z) - A Plug-and-play Scheme to Adapt Image Saliency Deep Model for Video Data [54.198279280967185]
This paper proposes a novel plug-and-play scheme to weakly retrain a pretrained image saliency deep model for video data.
Our method is simple yet effective for adapting any off-the-shelf pre-trained image saliency deep model to obtain high-quality video saliency detection.
arXiv Detail & Related papers (2020-08-02T13:23:14Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - Weakly Supervised 3D Human Pose and Shape Reconstruction with
Normalizing Flows [43.89097619822221]
We present semi-supervised and self-supervised models that support training and good generalization in real-world images and video.
Our formulation is based on kinematic latent normalizing flow representations and dynamics, as well as differentiable, semantic body part alignment loss functions.
In extensive experiments using 3D motion capture datasets like CMU, Human3.6M, 3DPW, or AMASS, we show that the proposed methods outperform the state of the art.
arXiv Detail & Related papers (2020-03-23T16:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.