DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
- URL: http://arxiv.org/abs/2409.12753v2
- Date: Sat, 21 Dec 2024 06:32:24 GMT
- Title: DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
- Authors: Qijian Tian, Xin Tan, Yuan Xie, Lizhuang Ma,
- Abstract summary: We propose a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input.
We jointly train a pose network, a depth network, and a Gaussian network to predict the primitives that represent the driving scenes.
Our model outperforms existing state-of-the-art feed-forward and scene-optimized reconstruction methods in terms of reconstruction.
- Score: 45.04354435388718
- License:
- Abstract: We propose DrivingForward, a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input. Driving scene images from vehicle-mounted cameras are typically sparse, with limited overlap, and the movement of the vehicle further complicates the acquisition of camera extrinsics. To tackle these challenges and achieve real-time reconstruction, we jointly train a pose network, a depth network, and a Gaussian network to predict the Gaussian primitives that represent the driving scenes. The pose network and depth network determine the position of the Gaussian primitives in a self-supervised manner, without using depth ground truth and camera extrinsics during training. The Gaussian network independently predicts primitive parameters from each input image, including covariance, opacity, and spherical harmonics coefficients. At the inference stage, our model can achieve feed-forward reconstruction from flexible multi-frame surround-view input. Experiments on the nuScenes dataset show that our model outperforms existing state-of-the-art feed-forward and scene-optimized reconstruction methods in terms of reconstruction.
Related papers
- OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving [12.47557991785691]
We propose OG-Gaussian, a novel approach that replaces LiDAR point clouds with Occupancy Grids (OGs) generated from surround-view camera images.
Our method leverages the semantic information in OGs to separate dynamic vehicles from static street background, converting these grids into two distinct sets of initial point clouds for reconstructing both static and dynamic objects.
Experiments on the Open dataset demonstrate that OG-Gaussian is on par with the current state-of-the-art in terms of reconstruction quality and rendering speed, achieving an average PSNR of 35.13 and a rendering speed of 143 FPS
arXiv Detail & Related papers (2025-02-20T04:00:47Z) - FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [59.77970844874235]
We present FreeSplatter, a feed-forward reconstruction framework capable of generating high-quality 3D Gaussians from sparse-view images.
FreeSplatter is built upon a streamlined transformer architecture, comprising sequential self-attention blocks.
We show FreeSplatter's potential in enhancing the productivity of downstream applications, such as text/image-to-3D content creation.
arXiv Detail & Related papers (2024-12-12T18:52:53Z) - PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence [3.61512056914095]
We present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length.
PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images.
arXiv Detail & Related papers (2024-11-25T19:16:29Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes [57.12439406121721]
We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes.
For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene.
We then leverage a composite dynamic Gaussian graph to handle multiple moving objects.
We further use a LiDAR prior for Gaussian Splatting to reconstruct scenes with greater details and maintain panoramic consistency.
arXiv Detail & Related papers (2023-12-13T06:30:51Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth
Prediction for Autonomous Driving [18.02943016671203]
This study proposes a comprehensive self-supervised framework for accurate scale-aware depth prediction on autonomous driving scenes.
In particular, we introduce a Full-Scale depth prediction network named FSNet.
With FSNet, robots and vehicles with only one well-calibrated camera can collect sequences of training image frames and camera poses, and infer accurate 3D depths of the environment without extra labeling work or 3D data.
arXiv Detail & Related papers (2023-04-21T03:17:04Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.