Related papers: Joint Prediction of Monocular Depth and Structure using Planar and Parallax Geometry

Joint Prediction of Monocular Depth and Structure using Planar and Parallax Geometry

URL: http://arxiv.org/abs/2207.06351v1
Date: Wed, 13 Jul 2022 17:04:05 GMT
Title: Joint Prediction of Monocular Depth and Structure using Planar and Parallax Geometry
Authors: Hao Xing, Yifan Cao, Maximilian Biber, Mingchuan Zhou, Darius Burschka
Abstract summary: Supervised learning depth estimation methods can achieve good performance when trained on high-quality ground-truth, like LiDAR data. We propose a novel approach combining structure information from a promising Plane and Parallax geometry pipeline with depth information into a U-Net supervised learning network. Our model has impressive performance on depth prediction of thin objects and edges, and compared to structure prediction baseline, our model performs more robustly.
Score: 4.620624344434533
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised learning depth estimation methods can achieve good performance when trained on high-quality ground-truth, like LiDAR data. However, LiDAR can only generate sparse 3D maps which causes losing information. Obtaining high-quality ground-truth depth data per pixel is difficult to acquire. In order to overcome this limitation, we propose a novel approach combining structure information from a promising Plane and Parallax geometry pipeline with depth information into a U-Net supervised learning network, which results in quantitative and qualitative improvement compared to existing popular learning-based methods. In particular, the model is evaluated on two large-scale and challenging datasets: KITTI Vision Benchmark and Cityscapes dataset and achieve the best performance in terms of relative error. Compared with pure depth supervision models, our model has impressive performance on depth prediction of thin objects and edges, and compared to structure prediction baseline, our model performs more robustly.

Related papers

Distilling Monocular Foundation Model for Fine-grained Depth Completion [17.603217168518356]
We propose a two-stage knowledge distillation framework to provide dense supervision for depth completion. In the first stage, we generate diverse training data from natural images, which distills geometric knowledge to depth completion. In the second stage, we employ a scale- and shift-invariant loss to learn real-world scales when fine-tuning on real-world datasets.
arXiv Detail & Related papers (2025-03-21T09:34:01Z)
DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation. We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features. We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z)
Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation [38.81275292687583]
We propose Plane2Depth, which adaptively utilizes plane information to improve depth prediction within a hierarchical framework. In the proposed plane guided depth generator (PGDG), we design a set of plane queries as prototypes to softly model planes in the scene and predict per-pixel plane coefficients. In the proposed adaptive plane query aggregation (APGA) module, we introduce a novel feature interaction approach to improve the aggregation of multi-scale plane features.
arXiv Detail & Related papers (2024-09-04T07:45:06Z)
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. Comprehensive experiments underscore our framework's superior generalization capabilities. Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds [29.15589024703907]
In this paper, we revisit the local point aggregators from the perspective of allocating computational resources. We find that the simplest pillar based models perform surprisingly well considering both accuracy and latency. Our results challenge the common intuition that the detailed geometry modeling is essential to achieve high performance for 3D object detection.
arXiv Detail & Related papers (2023-05-08T17:59:14Z)
Deep Planar Parallax for Monocular Depth Estimation [24.801102342402828]
In-depth analysis reveals that utilizing flow-pretrain can optimize the network's usage of consecutive frame modeling. We also propose Planar Position Embedding to handle dynamic objects that defy static scene assumptions.
arXiv Detail & Related papers (2023-01-09T06:02:36Z)
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
DenseLiDAR: A Real-Time Pseudo Dense Depth Guided Depth Completion Network [3.1447111126464997]
We propose DenseLiDAR, a novel real-time pseudo-depth guided depth completion neural network. We exploit dense pseudo-depth map obtained from simple morphological operations to guide the network. Our model is able to achieve the state-of-the-art performance at the highest frame rate of 50Hz.
arXiv Detail & Related papers (2021-08-28T14:18:29Z)
Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection. Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised. Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z)
PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z)
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation. We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z)
SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties. Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns. We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z)
Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples. We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.