FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth
Prediction for Autonomous Driving
- URL: http://arxiv.org/abs/2304.10719v1
- Date: Fri, 21 Apr 2023 03:17:04 GMT
- Title: FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth
Prediction for Autonomous Driving
- Authors: Yuxuan Liu, Zhenhua Xu, Huaiyang Huang, Lujia Wang, Ming Liu
- Abstract summary: This study proposes a comprehensive self-supervised framework for accurate scale-aware depth prediction on autonomous driving scenes.
In particular, we introduce a Full-Scale depth prediction network named FSNet.
With FSNet, robots and vehicles with only one well-calibrated camera can collect sequences of training image frames and camera poses, and infer accurate 3D depths of the environment without extra labeling work or 3D data.
- Score: 18.02943016671203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting accurate depth with monocular images is important for low-cost
robotic applications and autonomous driving. This study proposes a
comprehensive self-supervised framework for accurate scale-aware depth
prediction on autonomous driving scenes utilizing inter-frame poses obtained
from inertial measurements. In particular, we introduce a Full-Scale depth
prediction network named FSNet. FSNet contains four important improvements over
existing self-supervised models: (1) a multichannel output representation for
stable training of depth prediction in driving scenarios, (2) an
optical-flow-based mask designed for dynamic object removal, (3) a
self-distillation training strategy to augment the training process, and (4) an
optimization-based post-processing algorithm in test time, fusing the results
from visual odometry. With this framework, robots and vehicles with only one
well-calibrated camera can collect sequences of training image frames and
camera poses, and infer accurate 3D depths of the environment without extra
labeling work or 3D data. Extensive experiments on the KITTI dataset, KITTI-360
dataset and the nuScenes dataset demonstrate the potential of FSNet. More
visualizations are presented in \url{https://sites.google.com/view/fsnet/home}
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving [67.46481099962088]
Current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task.
We introduce emphcentricDriveWorld, which is capable of pre-training from multi-camera driving videos in atemporal fashion.
DriveWorld delivers promising results on various autonomous driving tasks.
arXiv Detail & Related papers (2024-05-07T15:14:20Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Perspective-aware Convolution for Monocular 3D Object Detection [2.33877878310217]
We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images.
By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture.
We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
arXiv Detail & Related papers (2023-08-24T17:25:36Z) - Simultaneous Multiple Object Detection and Pose Estimation using 3D
Model Infusion with Monocular Vision [21.710141497071373]
Multiple object detection and pose estimation are vital computer vision tasks.
We propose simultaneous neural modeling of both using monocular vision and 3D model infusion.
Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network.
arXiv Detail & Related papers (2022-11-21T05:18:56Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Weakly Supervised Training of Monocular 3D Object Detectors Using Wide
Baseline Multi-view Traffic Camera Data [19.63193201107591]
7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users.
We develop an approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras.
Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets.
arXiv Detail & Related papers (2021-10-21T08:26:48Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - Exploring the Capabilities and Limits of 3D Monocular Object Detection
-- A Study on Simulation and Real World Data [0.0]
3D object detection based on monocular camera data is key enabler for autonomous driving.
Recent deep learning methods show promising results to recover depth information from single images.
In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations.
arXiv Detail & Related papers (2020-05-15T09:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.