LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR
- URL: http://arxiv.org/abs/2109.03569v1
- Date: Wed, 8 Sep 2021 12:06:31 GMT
- Title: LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR
- Authors: Florent Bartoccioni, \'Eloi Zablocki, Patrick P\'erez, Matthieu Cord,
Karteek Alahari
- Abstract summary: Vision-based depth estimation is a key feature in autonomous systems.
In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs.
In this paper, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR.
- Score: 40.98198236276633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-based depth estimation is a key feature in autonomous systems, which
often relies on a single camera or several independent ones. In such a
monocular setup, dense depth is obtained with either additional input from one
or several expensive LiDARs, e.g., with 64 beams, or camera-only methods, which
suffer from scale-ambiguity and infinite-depth problems. In this paper, we
propose a new alternative of densely estimating metric depth by combining a
monocular camera with a light-weight LiDAR, e.g., with 4 beams, typical of
today's automotive-grade mass-produced laser scanners. Inspired by recent
self-supervised methods, we introduce a novel framework, called LiDARTouch, to
estimate dense depth maps from monocular images with the help of ``touches'' of
LiDAR, i.e., without the need for dense ground-truth depth. In our setup, the
minimal LiDAR input contributes on three different levels: as an additional
model's input, in a self-supervised LiDAR reconstruction objective function,
and to estimate changes of pose (a key component of self-supervised depth
estimation architectures). Our LiDARTouch framework achieves new state of the
art in self-supervised depth estimation on the KITTI dataset, thus supporting
our choices of integrating the very sparse LiDAR signal with other visual
features. Moreover, we show that the use of a few-beam LiDAR alleviates scale
ambiguity and infinite-depth issues that camera-only methods suffer from. We
also demonstrate that methods from the fully-supervised depth-completion
literature can be adapted to a self-supervised regime with a minimal LiDAR
signal.
Related papers
- Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images.
In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data.
We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields [112.62936571539232]
We introduce a new task, novel view synthesis for LiDAR sensors.
Traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views.
We use a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points.
arXiv Detail & Related papers (2023-04-20T15:44:37Z) - Weakly Supervised 3D Multi-person Pose Estimation for Large-scale Scenes
based on Monocular Camera and Single LiDAR [41.39277657279448]
We propose a monocular camera and single LiDAR-based method for 3D multi-person pose estimation in large-scale scenes.
Specifically, we design an effective fusion strategy to take advantage of multi-modal input data, including images and point cloud.
Our method exploits the inherent geometry constraints of point cloud for self-supervision and utilizes 2D keypoints on images for weak supervision.
arXiv Detail & Related papers (2022-11-30T12:50:40Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - Two-Photon Interference LiDAR Imaging [0.0]
We present a quantum interference inspired approach to LiDAR which achieves OCT depth resolutions without the need for high levels of stability.
We demonstrate depth imaging capabilities with an effective impulse response of 70 mum, thereby allowing ranging and multiple reflections to be discerned with much higher resolution than conventional LiDAR approaches.
This enhanced resolution opens up avenues for LiDAR in 3D facial recognition, and small feature detection/tracking as well as enhancing the capabilities of more complex time-of-flight methods such as imaging through obscurants and non-line-of-sight imaging.
arXiv Detail & Related papers (2022-06-20T09:08:51Z) - Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR [22.202192422883122]
We propose a novel two-stage network to advance the self-supervised monocular dense depth learning.
Our model fuses monocular image features and sparse LiDAR features to predict initial depth maps.
Our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection.
arXiv Detail & Related papers (2021-09-20T15:28:36Z) - Full Surround Monodepth from Multiple Cameras [31.145598985137468]
We extend self-supervised monocular depth and ego-motion estimation to large photo-baseline multi-camera rigs.
We learn a single network generating dense, consistent, and scale-aware point clouds that cover the same full surround 360 degree field of view as a typical LiDAR scanner.
arXiv Detail & Related papers (2021-03-31T22:52:04Z) - Depth Sensing Beyond LiDAR Range [84.19507822574568]
We propose a novel three-camera system that utilizes small field of view cameras.
Our system, along with our novel algorithm for computing metric depth, does not require full pre-calibration.
It can output dense depth maps with practically acceptable accuracy for scenes and objects at long distances.
arXiv Detail & Related papers (2020-04-07T00:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.