MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D
Vehicle Reconstruction in Autonomous Driving
- URL: http://arxiv.org/abs/2309.16715v1
- Date: Mon, 21 Aug 2023 15:48:15 GMT
- Title: MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D
Vehicle Reconstruction in Autonomous Driving
- Authors: Yibo Liu, Kelly Zhu, Guile Wu, Yuan Ren, Bingbing Liu, Yang Liu,
Jinjun Shan
- Abstract summary: We propose a novel framework, dubbed MV-DeepSDF, which estimates the optimal Signed Distance Function (SDF) shape representation from multi-sweep point clouds.
We conduct thorough experiments on two real-world autonomous driving datasets.
- Score: 25.088617195439344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing 3D vehicles from noisy and sparse partial point clouds is of
great significance to autonomous driving. Most existing 3D reconstruction
methods cannot be directly applied to this problem because they are elaborately
designed to deal with dense inputs with trivial noise. In this work, we propose
a novel framework, dubbed MV-DeepSDF, which estimates the optimal Signed
Distance Function (SDF) shape representation from multi-sweep point clouds to
reconstruct vehicles in the wild. Although there have been some SDF-based
implicit modeling methods, they only focus on single-view-based reconstruction,
resulting in low fidelity. In contrast, we first analyze multi-sweep
consistency and complementarity in the latent feature space and propose to
transform the implicit space shape estimation problem into an element-to-set
feature extraction problem. Then, we devise a new architecture to extract
individual element-level representations and aggregate them to generate a
set-level predicted latent code. This set-level latent code is an expression of
the optimal 3D shape in the implicit space, and can be subsequently decoded to
a continuous SDF of the vehicle. In this way, our approach learns consistent
and complementary information among multi-sweeps for 3D vehicle reconstruction.
We conduct thorough experiments on two real-world autonomous driving datasets
(Waymo and KITTI) to demonstrate the superiority of our approach over
state-of-the-art alternative methods both qualitatively and quantitatively.
Related papers
- GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field [5.573454319150408]
We introduce a volumetric optimization framework that combines explicit SDF fields with a shallow color network, in order to estimate 3D shape properties over tetrahedral grids.
Experimental results with Chamfer statistics validate this approach with unprecedented reconstruction quality on various scenarios such as objects, open scenes or human.
arXiv Detail & Related papers (2024-07-29T09:46:39Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Multi-view 3D Object Reconstruction and Uncertainty Modelling with
Neural Shape Prior [9.716201630968433]
3D object reconstruction is important for semantic scene understanding.
It is challenging to reconstruct detailed 3D shapes from monocular images directly due to a lack of depth information, occlusion and noise.
We tackle this problem by leveraging a neural object representation which learns an object shape distribution from large dataset of 3d object models and maps it into a latent space.
We propose a method to model uncertainty as part of the representation and define an uncertainty-aware encoder which generates latent codes with uncertainty directly from individual input images.
arXiv Detail & Related papers (2023-06-17T03:25:13Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D
Object Detection [17.526914782562528]
We propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign.
Our best model reaches 72.4 NDS on nuScenes test leaderboard, achieving new state-of-the-art results.
arXiv Detail & Related papers (2022-07-21T06:17:23Z) - AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [29.018733252938926]
Powerful priors allow us to perform inference with insufficient information.
We propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation.
arXiv Detail & Related papers (2022-03-17T17:59:54Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds [76.52448276587707]
We propose Reconfigurable Voxels, a new approach to constructing representations from 3D point clouds.
Specifically, we devise a biased random walk scheme, which adaptively covers each neighborhood with a fixed number of voxels.
We find that this approach effectively improves the stability of voxel features, especially for sparse regions.
arXiv Detail & Related papers (2020-04-06T15:07:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.