G2-MonoDepth: A General Framework of Generalized Depth Inference from
Monocular RGB+X Data
- URL: http://arxiv.org/abs/2310.15422v1
- Date: Tue, 24 Oct 2023 00:28:24 GMT
- Title: G2-MonoDepth: A General Framework of Generalized Depth Inference from
Monocular RGB+X Data
- Authors: Haotian Wang, Meng Yang, and Nanning Zheng
- Abstract summary: Monocular depth inference is a fundamental problem for scene perception of robots.
G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes.
It always outperforms SOTA baselines on both real-world data and synthetic data.
- Score: 36.24020602917672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth inference is a fundamental problem for scene perception of
robots. Specific robots may be equipped with a camera plus an optional depth
sensor of any type and located in various scenes of different scales, whereas
recent advances derived multiple individual sub-tasks. It leads to additional
burdens to fine-tune models for specific robots and thereby high-cost
customization in large-scale industrialization. This paper investigates a
unified task of monocular depth inference, which infers high-quality depth maps
from all kinds of input raw data from various robots in unseen scenes. A basic
benchmark G2-MonoDepth is developed for this task, which comprises four
components: (a) a unified data representation RGB+X to accommodate RGB plus raw
depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and
errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth
sparsity/errors of input raw data and diverse scales of output scenes, (c) an
improved network to well propagate diverse scene scales from input to output,
and (d) a data augmentation pipeline to simulate all types of real artifacts in
raw depth maps for training. G2-MonoDepth is applied in three sub-tasks
including depth estimation, depth completion with different sparsity, and depth
enhancement in unseen scenes, and it always outperforms SOTA baselines on both
real-world data and synthetic data.
Related papers
- Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation? [61.234412062595155]
We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation.
In contrast to the well-known approach where full 3D scenes of a virtual world are utilized to generate artificial datasets, ANYU was created by incorporating RGB-D representations of virtual reality objects.
We show that ANYU improves the monocular depth estimation performance and generalization of deep neural networks with considerably different architectures.
arXiv Detail & Related papers (2024-04-15T05:44:03Z) - SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model [72.0795843450604]
Current approaches face challenges in maintaining consistent accuracy across diverse scenes.
These methods rely on extensive datasets comprising millions, if not tens of millions, of data for training.
This paper presents SM$4$Depth, a model that seamlessly works for both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-03-13T14:08:25Z) - High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation [3.349875948009985]
We generate a high-resolution synthetic depth dataset (HRSD) of dimension 1920 X 1080 from Grand Theft Auto (GTA-V), which contains 100,000 color images and corresponding dense ground truth depth maps.
For experiments and analysis, we train the DPT algorithm, a state-of-the-art transformer-based MDE algorithm on the proposed synthetic dataset, which significantly increases the accuracy of depth maps on different scenes by 9 %.
arXiv Detail & Related papers (2023-05-02T19:03:08Z) - Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features.
We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations.
We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Unsupervised Single-shot Depth Estimation using Perceptual
Reconstruction [0.0]
This study presents the most recent advances in the field of generative neural networks, leveraging them to perform fully unsupervised single-shot depth synthesis.
Two generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance and a novel perceptual reconstruction term.
The success observed in this study suggests the great potential for unsupervised single-shot depth estimation in real-world applications.
arXiv Detail & Related papers (2022-01-28T15:11:34Z) - Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein
GANs [1.0499611180329802]
Real-time estimation of actual environment depth is an essential module for various autonomous system tasks.
In this study, latest advancements in the field of generative neural networks are leveraged to fully unsupervised single-image depth synthesis.
arXiv Detail & Related papers (2021-03-31T09:43:38Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.