CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting
- URL: http://arxiv.org/abs/2401.18075v2
- Date: Fri, 19 Jul 2024 21:20:35 GMT
- Title: CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting
- Authors: Jiezhi Yang, Khushi Desai, Charles Packer, Harshil Bhatia, Nicholas Rhinehart, Rowan McAllister, Joseph Gonzalez,
- Abstract summary: We propose CARFF, a method for predicting future 3D scenes given past observations.
We employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations.
We demonstrate the utility of our method in scenarios using the CARLA driving simulator.
- Score: 15.392692128626809
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose CARFF, a method for predicting future 3D scenes given past observations. Our method maps 2D ego-centric images to a distribution over plausible 3D latent scene configurations and predicts the evolution of hypothesized scenes through time. Our latents condition a global Neural Radiance Field (NeRF) to represent a 3D scene model, enabling explainable predictions and straightforward downstream planning. This approach models the world as a POMDP and considers complex scenarios of uncertainty in environmental states and dynamics. Specifically, we employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations, and auto-regressively predict latent scene representations utilizing a mixture density network. We demonstrate the utility of our method in scenarios using the CARLA driving simulator, where CARFF enables efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving occlusions.
Related papers
- WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction [9.639795825672023]
Off-road environments are rich in geometric information, therefore it is suitable for 3D semantic occupancy prediction tasks.
We introduce WildOcc, the first benchmark to provide dense occupancy annotations for off-road 3D semantic occupancy prediction tasks.
A ground truth generation pipeline is proposed in this paper, which employs a coarse-to-fine reconstruction to achieve a more realistic result.
arXiv Detail & Related papers (2024-10-21T09:02:40Z) - AdaOcc: Adaptive-Resolution Occupancy Prediction [20.0994984349065]
We introduce AdaOcc, a novel adaptive-resolution, multi-modal prediction approach.
Our method integrates object-centric 3D reconstruction and holistic occupancy prediction within a single framework.
In close-range scenarios, we surpass previous baselines by over 13% in IOU, and over 40% in Hausdorff distance.
arXiv Detail & Related papers (2024-08-24T03:46:25Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis.
GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes.
Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z) - Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution [4.204990010424084]
In autonomous vehicles, understanding the surrounding 3D environment of the ego vehicle in real-time is essential.
State of the art 3D mapping methods leverage transformers with cross-attention mechanisms to elevate 2D vision-centric camera features into the 3D domain.
This paper introduces an approach that extracts features from front-view 2D camera images and LiDAR scans, then employs a sparse convolution network (Minkowski Engine) for 3D semantic occupancy prediction.
arXiv Detail & Related papers (2024-03-13T17:50:59Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv Detail & Related papers (2022-07-15T16:57:43Z) - ONCE-3DLanes: Building Monocular 3D Lane Detection [41.46466150783367]
We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space.
By exploiting the explicit relationship between point clouds and image pixels, a dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations.
arXiv Detail & Related papers (2022-04-30T16:35:25Z) - Learning Continuous Environment Fields via Implicit Functions [144.4913852552954]
We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.
We demonstrate that this environment field representation can directly guide the dynamic behaviors of agents in 2D mazes or 3D indoor scenes.
arXiv Detail & Related papers (2021-11-27T22:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.