High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation
- URL: http://arxiv.org/abs/2305.01732v1
- Date: Tue, 2 May 2023 19:03:08 GMT
- Title: High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation
- Authors: Aakash Rajpal, Noshaba Cheema, Klaus Illgner-Fehns, Philipp Slusallek,
Sunil Jaiswal
- Abstract summary: We generate a high-resolution synthetic depth dataset (HRSD) of dimension 1920 X 1080 from Grand Theft Auto (GTA-V), which contains 100,000 color images and corresponding dense ground truth depth maps.
For experiments and analysis, we train the DPT algorithm, a state-of-the-art transformer-based MDE algorithm on the proposed synthetic dataset, which significantly increases the accuracy of depth maps on different scenes by 9 %.
- Score: 3.349875948009985
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurate depth maps are essential in various applications, such as autonomous
driving, scene reconstruction, point-cloud creation, etc. However,
monocular-depth estimation (MDE) algorithms often fail to provide enough
texture & sharpness, and also are inconsistent for homogeneous scenes. These
algorithms mostly use CNN or vision transformer-based architectures requiring
large datasets for supervised training. But, MDE algorithms trained on
available depth datasets do not generalize well and hence fail to perform
accurately in diverse real-world scenes. Moreover, the ground-truth depth maps
are either lower resolution or sparse leading to relatively inconsistent depth
maps. In general, acquiring a high-resolution ground truth dataset with
pixel-level precision for accurate depth prediction is an expensive, and
time-consuming challenge.
In this paper, we generate a high-resolution synthetic depth dataset (HRSD)
of dimension 1920 X 1080 from Grand Theft Auto (GTA-V), which contains 100,000
color images and corresponding dense ground truth depth maps. The generated
datasets are diverse and have scenes from indoors to outdoors, from homogeneous
surfaces to textures. For experiments and analysis, we train the DPT algorithm,
a state-of-the-art transformer-based MDE algorithm on the proposed synthetic
dataset, which significantly increases the accuracy of depth maps on different
scenes by 9 %. Since the synthetic datasets are of higher resolution, we
propose adding a feature extraction module in the transformer encoder and
incorporating an attention-based loss, further improving the accuracy by 15 %.
Related papers
- Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution [55.9977636042469]
Bit-depth compression produces a uniform depth representation in regions with subtle variations, hindering the recovery of detailed information.
densely distributed random noise reduces the accuracy of estimating the global geometric structure of the scene.
We propose a novel framework, termed geometry-decoupled network (GDNet), for compressed depth map super-resolution.
arXiv Detail & Related papers (2024-11-05T16:37:30Z) - Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation [12.757150641117077]
"Shape2.5D" is a novel, large-scale dataset designed to address this gap.
The proposed dataset includes synthetic images rendered with 3D modeling software.
It also includes a real-world subset comprising 4,672 frames captured with a depth camera.
arXiv Detail & Related papers (2024-06-22T12:24:49Z) - SelfReDepth: Self-Supervised Real-Time Depth Restoration for Consumer-Grade Sensors [42.48726526726542]
SelfReDepth is a self-supervised deep learning technique for depth restoration.
It uses multiple sequential depth frames and color data to achieve high-quality depth videos with temporal coherence.
Our results demonstrate our approach's real-time performance on real-world datasets.
arXiv Detail & Related papers (2024-06-05T15:38:02Z) - Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation? [61.234412062595155]
We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation.
In contrast to the well-known approach where full 3D scenes of a virtual world are utilized to generate artificial datasets, ANYU was created by incorporating RGB-D representations of virtual reality objects.
We show that ANYU improves the monocular depth estimation performance and generalization of deep neural networks with considerably different architectures.
arXiv Detail & Related papers (2024-04-15T05:44:03Z) - SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model [72.0795843450604]
Current approaches face challenges in maintaining consistent accuracy across diverse scenes.
These methods rely on extensive datasets comprising millions, if not tens of millions, of data for training.
This paper presents SM$4$Depth, a model that seamlessly works for both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-03-13T14:08:25Z) - G2-MonoDepth: A General Framework of Generalized Depth Inference from
Monocular RGB+X Data [36.24020602917672]
Monocular depth inference is a fundamental problem for scene perception of robots.
G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes.
It always outperforms SOTA baselines on both real-world data and synthetic data.
arXiv Detail & Related papers (2023-10-24T00:28:24Z) - RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion [28.634851863097953]
We propose a novel two-branch end-to-end fusion network named RDFC-GAN.
It takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map.
The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption.
The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps.
arXiv Detail & Related papers (2023-06-06T11:03:05Z) - BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and
Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels.
One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task.
The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z) - Towards Unpaired Depth Enhancement and Super-Resolution in the Wild [121.96527719530305]
State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of low- and high-resolution depth maps of the same scenes.
We consider an approach to depth map enhancement based on learning from unpaired data.
arXiv Detail & Related papers (2021-05-25T16:19:16Z) - Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark
Dataset and Baseline [48.69396457721544]
We build a large-scale dataset named "RGB-D-D" to promote the study of depth map super-resolution (SR)
We provide a fast depth map super-resolution (FDSR) baseline, in which the high-frequency component adaptively decomposed from RGB image to guide the depth map SR.
For the real-world LR depth maps, our algorithm can produce more accurate HR depth maps with clearer boundaries and to some extent correct the depth value errors.
arXiv Detail & Related papers (2021-04-13T13:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.