Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR
- URL: http://arxiv.org/abs/2109.09628v2
- Date: Tue, 21 Sep 2021 16:37:15 GMT
- Title: Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR
- Authors: Ziyue Feng, Longlong Jing, Peng Yin, Yingli Tian, Bing Li
- Abstract summary: We propose a novel two-stage network to advance the self-supervised monocular dense depth learning.
Our model fuses monocular image features and sparse LiDAR features to predict initial depth maps.
Our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection.
- Score: 22.202192422883122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised monocular depth prediction provides a cost-effective solution
to obtain the 3D location of each pixel. However, the existing approaches
usually lead to unsatisfactory accuracy, which is critical for autonomous
robots. In this paper, we propose a novel two-stage network to advance the
self-supervised monocular dense depth learning by leveraging low-cost sparse
(e.g. 4-beam) LiDAR. Unlike the existing methods that use sparse LiDAR mainly
in a manner of time-consuming iterative post-processing, our model fuses
monocular image features and sparse LiDAR features to predict initial depth
maps. Then, an efficient feed-forward refine network is further designed to
correct the errors in these initial depth maps in pseudo-3D space with
real-time performance. Extensive experiments show that our proposed model
significantly outperforms all the state-of-the-art self-supervised methods, as
well as the sparse-LiDAR-based methods on both self-supervised monocular depth
prediction and completion tasks. With the accurate dense depth prediction, our
model outperforms the state-of-the-art sparse-LiDAR-based method
(Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object
detection on the KITTI Leaderboard.
Related papers
- Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis [53.702118455883095]
We propose a novel method for synthesizing novel views from sparse views with Gaussian Splatting.
Our key idea lies in exploring the self-supervisions inherent in the binocular stereo consistency between each pair of binocular images.
Our method significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-24T15:10:27Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z) - MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds [13.426810473131642]
Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction.
In a novel reconstruction approach, MAELi distinguishes between empty and occluded space.
Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics.
arXiv Detail & Related papers (2022-12-14T13:10:27Z) - Self-supervised 3D Object Detection from Monocular Pseudo-LiDAR [9.361704310981196]
We propose a method for predicting absolute depth and detecting 3D objects using only monocular image sequences.
As a result, the proposed method surpasses other existing methods in performance on the KITTI 3D dataset.
arXiv Detail & Related papers (2022-09-20T05:55:49Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z) - LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR [40.98198236276633]
Vision-based depth estimation is a key feature in autonomous systems.
In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs.
In this paper, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR.
arXiv Detail & Related papers (2021-09-08T12:06:31Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.