GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
- URL: http://arxiv.org/abs/2304.02163v2
- Date: Mon, 28 Aug 2023 06:03:39 GMT
- Title: GINA-3D: Learning to Generate Implicit Neural Assets in the Wild
- Authors: Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng,
Leonidas Guibas, Yin Zhou, Dragomir Anguelov
- Abstract summary: GINA-3D is a generative model that uses real-world driving data from camera and LiDAR sensors to create 3D implicit neural assets of diverse vehicles and pedestrians.
We construct a large-scale object-centric dataset containing over 1.2M images of vehicles and pedestrians.
We demonstrate that it achieves state-of-the-art performance in quality and diversity for both generated images and geometries.
- Score: 38.51391650845503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling the 3D world from sensor data for simulation is a scalable way of
developing testing and validation environments for robotic learning problems
such as autonomous driving. However, manually creating or re-creating
real-world-like environments is difficult, expensive, and not scalable. Recent
generative model techniques have shown promising progress to address such
challenges by learning 3D assets using only plentiful 2D images -- but still
suffer limitations as they leverage either human-curated image datasets or
renderings from manually-created synthetic 3D environments. In this paper, we
introduce GINA-3D, a generative model that uses real-world driving data from
camera and LiDAR sensors to create realistic 3D implicit neural assets of
diverse vehicles and pedestrians. Compared to the existing image datasets, the
real-world driving setting poses new challenges due to occlusions,
lighting-variations and long-tail distributions. GINA-3D tackles these
challenges by decoupling representation learning and generative modeling into
two stages with a learned tri-plane latent structure, inspired by recent
advances in generative modeling of images. To evaluate our approach, we
construct a large-scale object-centric dataset containing over 1.2M images of
vehicles and pedestrians from the Waymo Open Dataset, and a new set of 80K
images of long-tail instances such as construction equipment, garbage trucks,
and cable cars. We compare our model with existing approaches and demonstrate
that it achieves state-of-the-art performance in quality and diversity for both
generated images and geometries.
Related papers
- RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image [30.049602796278133]
High-quality 3D car assets are essential for various applications, including video games, autonomous driving, and virtual reality.
Current 3D generation methods utilizing NeRF or 3D-GS as representations for 3D objects, generate a Lambertian object under fixed lighting.
We propose a novel relightable 3D object generative framework that automates the creation of 3D car assets from a single input image.
arXiv Detail & Related papers (2024-10-10T17:54:03Z) - Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text [61.9973218744157]
We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories.
Experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.
arXiv Detail & Related papers (2024-06-25T14:42:51Z) - Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - Deep Generative Models on 3D Representations: A Survey [81.73385191402419]
Generative models aim to learn the distribution of observed data by generating new instances.
Recently, researchers started to shift focus from 2D to 3D space.
representing 3D data poses significantly greater challenges.
arXiv Detail & Related papers (2022-10-27T17:59:50Z) - Using Adaptive Gradient for Texture Learning in Single-View 3D
Reconstruction [0.0]
Learning-based approaches for 3D model reconstruction have attracted attention owing to its modern applications.
We present a novel sampling algorithm by optimizing the gradient of predicted coordinates based on the variance on the sampling image.
We also adopt Frechet Inception Distance (FID) to form a loss function in learning, which helps bridging the gap between rendered images and input images.
arXiv Detail & Related papers (2021-04-29T07:52:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.