Is my Depth Ground-Truth Good Enough? HAMMER -- Highly Accurate
Multi-Modal Dataset for DEnse 3D Scene Regression
- URL: http://arxiv.org/abs/2205.04565v1
- Date: Mon, 9 May 2022 21:25:09 GMT
- Title: Is my Depth Ground-Truth Good Enough? HAMMER -- Highly Accurate
Multi-Modal Dataset for DEnse 3D Scene Regression
- Authors: HyunJun Jung, Patrick Ruhkamp, Guangyao Zhai, Nikolas Brasch, Yitong
Li, Yannick Verdie, Jifei Song, Yiren Zhou, Anil Armagan, Slobodan Ilic, Ales
Leonardis, Benjamin Busam
- Abstract summary: HAMMER is a dataset comprising depth estimates from multiple commonly used sensors for indoor depth estimation.
We construct highly reliable ground truth depth maps with the help of 3D scanners and aligned renderings.
A popular depth estimators is trained on this data and typical depth senosors.
- Score: 34.95597838973912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation is a core task in 3D computer vision. Recent methods
investigate the task of monocular depth trained with various depth sensor
modalities. Every sensor has its advantages and drawbacks caused by the nature
of estimates. In the literature, mostly mean average error of the depth is
investigated and sensor capabilities are typically not discussed. Especially
indoor environments, however, pose challenges for some devices. Textureless
regions pose challenges for structure from motion, reflective materials are
problematic for active sensing, and distances for translucent material are
intricate to measure with existing sensors. This paper proposes HAMMER, a
dataset comprising depth estimates from multiple commonly used sensors for
indoor depth estimation, namely ToF, stereo, structured light together with
monocular RGB+P data. We construct highly reliable ground truth depth maps with
the help of 3D scanners and aligned renderings. A popular depth estimators is
trained on this data and typical depth senosors. The estimates are extensively
analyze on different scene structures. We notice generalization issues arising
from various sensor technologies in household environments with challenging but
everyday scene content. HAMMER, which we make publicly available, provides a
reliable base to pave the way to targeted depth improvements and sensor fusion
approaches.
Related papers
- SelfReDepth: Self-Supervised Real-Time Depth Restoration for Consumer-Grade Sensors [42.48726526726542]
SelfReDepth is a self-supervised deep learning technique for depth restoration.
It uses multiple sequential depth frames and color data to achieve high-quality depth videos with temporal coherence.
Our results demonstrate our approach's real-time performance on real-world datasets.
arXiv Detail & Related papers (2024-06-05T15:38:02Z) - Robust Depth Enhancement via Polarization Prompt Fusion Tuning [112.88371907047396]
We present a framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors.
Our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors.
To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets.
arXiv Detail & Related papers (2024-04-05T17:55:33Z) - On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks [61.74608497496841]
Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities.
This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction.
arXiv Detail & Related papers (2023-03-26T22:32:44Z) - Unsupervised confidence for LiDAR depth maps and applications [43.474845978673166]
We propose an effective unsupervised framework aimed at addressing the issue of sparse LiDAR depth maps.
Our framework estimates the confidence of the sparse depth map and thus allows for filtering out the outliers.
We demonstrate how this achievement can improve a wide range of tasks.
arXiv Detail & Related papers (2022-10-06T17:59:58Z) - Domain Randomization-Enhanced Depth Simulation and Restoration for
Perceiving and Grasping Specular and Transparent Objects [28.84776177634971]
We propose a powerful RGBD fusion network, SwinDRNet, for depth restoration.
We also propose Domain Randomization-Enhanced Depth Simulation (DREDS) approach to simulate an active stereo depth system.
We show that our depth restoration effectively boosts the performance of downstream tasks.
arXiv Detail & Related papers (2022-08-07T19:17:16Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes [68.38952377590499]
We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments.
Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
arXiv Detail & Related papers (2021-08-12T09:12:39Z) - EagerMOT: 3D Multi-Object Tracking via Sensor Fusion [68.8204255655161]
Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time.
Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal.
We propose EagerMOT, a simple tracking formulation that integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics.
arXiv Detail & Related papers (2021-04-29T22:30:29Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.