Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges
- URL: http://arxiv.org/abs/2511.05152v1
- Date: Fri, 07 Nov 2025 11:07:34 GMT
- Title: Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges
- Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull,
- Abstract summary: Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation.<n>We introduce an approach that splits canonical GS into foreground and background components using a sparse set of masks for frames at t=0.<n>Our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes.
- Score: 6.1973784462444685
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures. Code and video comparisons are available online: https://interims-git.github.io/
Related papers
- DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair [41.78277294238276]
We propose a method that can use only two images without prior poses to fit Gaussians in dynamic environments.<n>This strategy decomposes dynamic scenes into piece-wise rigid components, and jointly estimates the camera pose and motions of dynamic objects.<n>Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-01T15:25:33Z) - PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments [49.38692556283867]
We propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting.
Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera.
Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation.
arXiv Detail & Related papers (2024-11-24T12:00:55Z) - Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting [94.84688557937123]
Video-3DGS is a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors.<n>Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos.<n>It enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.
arXiv Detail & Related papers (2024-06-04T17:57:37Z) - Hybrid Explicit Representation for Ultra-Realistic Head Avatars [55.829497543262214]
We introduce a novel approach to creating ultra-realistic head avatars and rendering them in real-time.<n> UV-mapped 3D mesh is utilized to capture sharp and rich textures on smooth surfaces, while 3D Gaussian Splatting is employed to represent complex geometric structures.<n>Experiments that our modeled results exceed those of state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T04:01:26Z) - SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting [7.553079256251747]
We extend 3D Gaussian Splatting to reconstruct dynamic scenes.
We produce high-quality renderings of general dynamic scenes with competitive quantitative performance.
Our method can be viewed in real-time in our dynamic interactive viewer.
arXiv Detail & Related papers (2023-12-20T03:54:03Z) - DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation [66.7719069053058]
DeformGS is an approach to recover scene flow in highly deformable scenes using simultaneous video captures of a dynamic scene from multiple cameras.
DeformGS improves 3D tracking by an average of 55.8% compared to the state-of-the-art.
With sufficient texture, DeformGS achieves a median tracking error of 3.3 mm on a cloth of 1.5 x 1.5 m in area.
arXiv Detail & Related papers (2023-11-30T18:53:03Z) - MoDA: Modeling Deformable 3D Objects from Casual Videos [84.29654142118018]
We propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation without skin-collapsing artifacts.
In the endeavor to register 2D pixels across different frames, we establish a correspondence between canonical feature embeddings that encodes 3D points within the canonical space.
Our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-04-17T13:49:04Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Deep 3D Mask Volume for View Synthesis of Dynamic Scenes [49.45028543279115]
We introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS.
The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes.
We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras.
arXiv Detail & Related papers (2021-08-30T17:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.