Neural Rendering based Urban Scene Reconstruction for Autonomous Driving
- URL: http://arxiv.org/abs/2402.06826v1
- Date: Fri, 9 Feb 2024 23:20:23 GMT
- Title: Neural Rendering based Urban Scene Reconstruction for Autonomous Driving
- Authors: Shihao Shen, Louis Kerofsky, Varun Ravi Kumar and Senthil Yogamani
- Abstract summary: We propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields.
Dense 3D reconstruction has many applications in automated driving including automated annotation validation.
We demonstrate qualitative and quantitative results on challenging automotive scenes.
- Score: 8.007494499012624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dense 3D reconstruction has many applications in automated driving including
automated annotation validation, multimodal data augmentation, providing ground
truth annotations for systems lacking LiDAR, as well as enhancing auto-labeling
accuracy. LiDAR provides highly accurate but sparse depth, whereas camera
images enable estimation of dense depth but noisy particularly at long ranges.
In this paper, we harness the strengths of both sensors and propose a
multimodal 3D scene reconstruction using a framework combining neural implicit
surfaces and radiance fields. In particular, our method estimates dense and
accurate 3D structures and creates an implicit map representation based on
signed distance fields, which can be further rendered into RGB images, and
depth maps. A mesh can be extracted from the learned signed distance field and
culled based on occlusion. Dynamic objects are efficiently filtered on the fly
during sampling using 3D object detection models. We demonstrate qualitative
and quantitative results on challenging automotive scenes.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes [12.973283255413866]
MM-Gaussian is a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes.
We utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos.
To further bolster the robustness of our system, we designed a relocalization module.
arXiv Detail & Related papers (2024-04-05T11:14:19Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D Mapping [46.80755234561584]
Recent learning-based methods integrate neural implicit representations and optimizable feature grids to approximate surfaces of 3D scenes.
In this work we depart from fitting LiDAR data exactly, instead letting the network optimize a non-metric monotonic implicit field defined in 3D space.
Our algorithm achieves high-quality dense 3D mapping performance as captured by multiple quantitative and perceptual measures and visual results obtained for Mai City, Newer College, and KITTI benchmarks.
arXiv Detail & Related papers (2024-03-26T09:58:06Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured
Traffic Scenarios [0.0]
We propose OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features.
We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth.
arXiv Detail & Related papers (2023-07-20T15:06:44Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles
with Adaptive Truncated Signed Distance Function [9.414880946870916]
We propose a novel 3D reconstruction and semantic mapping system using LiDAR and camera sensors.
An Adaptive Truncated Function is introduced to describe surfaces implicitly, which can deal with different LiDAR point sparsities.
An optimal image patch selection strategy is proposed to estimate the optimal semantic class for each triangle mesh.
arXiv Detail & Related papers (2022-02-28T15:11:25Z) - Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots.
Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation.
We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.