Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting
- URL: http://arxiv.org/abs/2406.01042v2
- Date: Thu, 11 Jul 2024 15:02:38 GMT
- Title: Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting
- Authors: Fang Li, Hao Zhang, Narendra Ahuja,
- Abstract summary: We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters.
It includes the extraction of 2D point features that robustly represent 3D structure.
Results show significant improvements over state-of-the-art methods for 4D novel view synthesis.
- Score: 14.759265492381509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian Splatting (GS) has significantly elevated scene reconstruction efficiency and novel view synthesis (NVS) accuracy compared to Neural Radiance Fields (NeRF), particularly for dynamic scenes. However, current 4D NVS methods, whether based on GS or NeRF, primarily rely on camera parameters provided by COLMAP and even utilize sparse point clouds generated by COLMAP for initialization, which lack accuracy as well are time-consuming. This sometimes results in poor dynamic scene representation, especially in scenes with large object movements, or extreme camera conditions e.g. small translations combined with large rotations. Some studies simultaneously optimize the estimation of camera parameters and scenes, supervised by additional information like depth, optical flow, etc. obtained from off-the-shelf models. Using this unverified information as ground truth can reduce robustness and accuracy, which does frequently occur for long monocular videos (with e.g. > hundreds of frames). We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters. It includes the extraction of 2D point features that robustly represent 3D structure, and their use for subsequent joint optimization of camera parameters and 3D structure towards overall 4D scene optimization. We demonstrate the accuracy and time efficiency of our method through extensive quantitative and qualitative experimental results on several standard benchmarks. The results show significant improvements over state-of-the-art methods for 4D novel view synthesis. The source code will be released soon at https://github.com/fangli333/SC-4DGS.
Related papers
- GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization [11.418632671254564]
3D Gaussian Splatting has emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images.
We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals.
We show results on real-world scenes and complex trajectories through simulated environments.
arXiv Detail & Related papers (2024-10-11T12:01:15Z) - A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose [44.13819148680788]
We develop a novel construct-and-optimize method for sparse view synthesis without camera poses.
Specifically, we construct a solution by using monocular depth and projecting pixels back into the 3D world.
We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views.
arXiv Detail & Related papers (2024-05-06T17:36:44Z) - SAGS: Structure-Aware 3D Gaussian Splatting [53.6730827668389]
We propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene.
SAGS reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets.
arXiv Detail & Related papers (2024-04-29T23:26:30Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes [33.14021987166436]
We introduce 4DRotorGS, a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians.
As an explicit spatial-temporal representation, 4DRotorGS demonstrates powerful capabilities for modeling complicated dynamics and fine details.
We further implement our temporal slicing and acceleration framework, achieving real-time rendering speeds of up to 277 FPS on an 3090 GPU and 583 FPS on a 4090 GPU.
arXiv Detail & Related papers (2024-02-05T18:59:04Z) - Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects.
We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z) - 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering [103.32717396287751]
We propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes.
A neuralvoxel encoding algorithm inspired by HexPlane is proposed to efficiently build features from 4D neural voxels.
Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$times$800 resolution on an 3090 GPU.
arXiv Detail & Related papers (2023-10-12T17:21:41Z) - Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes [8.061773364318313]
We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video.
We provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences.
This represents a strong new performance point for crowded scenes, an important setting for computer vision.
arXiv Detail & Related papers (2023-09-15T17:44:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.