DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
- URL: http://arxiv.org/abs/2108.05615v1
- Date: Thu, 12 Aug 2021 09:12:39 GMT
- Title: DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes
- Authors: Dongki Jung, Jaehoon Choi, Yonghan Lee, Deokhwa Kim, Changick Kim,
Dinesh Manocha, Donghwan Lee
- Abstract summary: We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments.
Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
- Score: 68.38952377590499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel approach for estimating depth from a monocular camera as
it moves through complex and crowded indoor environments, e.g., a department
store or a metro station. Our approach predicts absolute scale depth maps over
the entire scene consisting of a static background and multiple moving people,
by training on dynamic scenes. Since it is difficult to collect dense depth
maps from crowded indoor environments, we design our training framework without
requiring depths produced from depth sensing devices. Our network leverages RGB
images and sparse depth maps generated from traditional 3D reconstruction
methods to estimate dense depth maps. We use two constraints to handle depth
for non-rigidly moving people without tracking their motion explicitly. We
demonstrate that our approach offers consistent improvements over recent depth
estimation methods on the NAVERLABS dataset, which includes complex and crowded
scenes.
Related papers
- Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering [4.717325308876748]
We present a novel approach to generate view consistent and detailed depth maps from a number of posed images.
We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps.
Our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches.
arXiv Detail & Related papers (2024-10-04T18:50:28Z) - Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation [23.93080319283679]
Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss.
Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation.
This paper proposes a self-supervised training framework exploiting pseudo depth labels for dynamic regions from training data.
arXiv Detail & Related papers (2024-04-23T10:51:15Z) - R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [106.52409577316389]
R3D3 is a multi-camera system for dense 3D reconstruction and ego-motion estimation.
Our approach exploits spatial-temporal information from multiple cameras, and monocular depth refinement.
We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments.
arXiv Detail & Related papers (2023-08-28T17:13:49Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes [85.56602190773684]
We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view.
By minimizing the error between the synthetic image and the corresponding real image in a video, the deep network that predicts pose and depth can be trained completely unsupervised.
arXiv Detail & Related papers (2021-05-05T17:08:10Z) - DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues [37.837552043766166]
We introduce the first depth dataset DynOcc consisting of dynamic in-the-wild scenes.
Our approach leverages the cues in these dynamic scenes to infer depth relationships between points of selected video frames.
In total our DynOcc dataset contains 22M depth pairs out of 91K frames from a diverse set of videos.
arXiv Detail & Related papers (2021-03-30T22:17:36Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - Depth Map Estimation of Dynamic Scenes Using Prior Depth Information [14.03714478207425]
We propose an algorithm that estimates depth maps using concurrently collected images and a previously measured depth map for dynamic scenes.
Our goal is to balance the acquisition of depth between the active depth sensor and computation, without incurring a large computational cost.
Our approach can obtain dense depth maps at up to real-time (30 FPS) on a standard laptop computer, which is orders of magnitude faster than similar approaches.
arXiv Detail & Related papers (2020-02-02T01:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.