Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction
- URL: http://arxiv.org/abs/2406.16289v1
- Date: Mon, 24 Jun 2024 03:30:20 GMT
- Title: Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction
- Authors: Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang,
- Abstract summary: We present a crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model.
We highlight that we present a comprehensive framework that integrates data selection, sparse 3D reconstruction, sequence appearance embedding, depth supervision of ground surface, and occlusion completion.
We propose an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.
- Score: 11.257007940120582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an opening problem. Noting that the automotive industry has a huge amount of image data, crowd-sourcing is a convenient way for large-scale data collection. In this paper, we present a crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model. This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them. Firstly, the crowd-sourced massive data is filtered to remove redundancy and keep a balanced distribution in terms of time and space. Then a structure-from-motion module is performed to refine camera poses. Finally, images, as well as poses, are used to train the NeRF model in a certain block. We highlight that we present a comprehensive framework that integrates multiple modules, including data selection, sparse 3D reconstruction, sequence appearance embedding, depth supervision of ground surface, and occlusion completion. The complete system is capable of effectively processing and reconstructing high-quality 3D scenes from crowd-sourced data. Extensive quantitative and qualitative experiments were conducted to validate the performance of our system. Moreover, we proposed an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.
Related papers
- Towards Degradation-Robust Reconstruction in Generalizable NeRF [58.33351079982745]
Generalizable Radiance Field (GNeRF) across scenes has been proven to be an effective way to avoid per-scene optimization.
There has been limited research on the robustness of GNeRFs to different types of degradation present in the source images.
arXiv Detail & Related papers (2024-11-18T16:13:47Z) - Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - MegaScenes: Scene-Level View Synthesis at Scale [69.21293001231993]
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications.
We create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world.
We analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency.
arXiv Detail & Related papers (2024-06-17T17:55:55Z) - LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives [44.06145846507639]
LetsGo is a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering.
We present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures.
We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images.
arXiv Detail & Related papers (2024-04-15T12:50:44Z) - DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model [86.37536249046943]
textbfDMV3D is a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion.
Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering.
arXiv Detail & Related papers (2023-11-15T18:58:41Z) - LRM: Large Reconstruction Model for Single Image to 3D [61.47357798633123]
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image.
We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects.
arXiv Detail & Related papers (2023-11-08T00:03:52Z) - A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View
Synthesis and Implicit Scene Reconstruction [26.122654478946227]
Neural Radiance Fields (NeRF) has achieved impressive results in single object scene reconstruction and novel view synthesis.
There is no unified outdoor scene dataset for large-scale NeRF evaluation due to expensive data acquisition and calibration costs.
In this paper, we propose a large-scale outdoor multi-modal dataset, OMMO dataset, containing complex land objects and scenes with calibrated images, point clouds and prompt annotations.
arXiv Detail & Related papers (2023-01-17T10:15:32Z) - CLONeR: Camera-Lidar Fusion for Occupancy Grid-aided Neural
Representations [77.90883737693325]
This paper proposes CLONeR, which significantly improves upon NeRF by allowing it to model large outdoor driving scenes observed from sparse input sensor views.
This is achieved by decoupling occupancy and color learning within the NeRF framework into separate Multi-Layer Perceptrons (MLPs) trained using LiDAR and camera data, respectively.
In addition, this paper proposes a novel method to build differentiable 3D Occupancy Grid Maps (OGM) alongside the NeRF model, and leverage this occupancy grid for improved sampling of points along a ray for rendering in metric space.
arXiv Detail & Related papers (2022-09-02T17:44:50Z) - Ground material classification and for UAV-based photogrammetric 3D data
A 2D-3D Hybrid Approach [1.3359609092684614]
In recent years, photogrammetry has been widely used in many areas to create 3D virtual data representing the physical environment.
These cutting-edge technologies have caught the US Army and Navy's attention for the purpose of rapid 3D battlefield reconstruction, virtual training, and simulations.
arXiv Detail & Related papers (2021-09-24T22:29:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.