Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
- URL: http://arxiv.org/abs/2506.12716v1
- Date: Sun, 15 Jun 2025 04:40:20 GMT
- Title: Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
- Authors: Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, Katerina Fragkiadaki,
- Abstract summary: GenMOJO is a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis.<n>It decomposes the scene into individual objects, optimizing a differentiable set of deformable Gaussians per object.<n>The resulting model generates 4D object reconstructions over space and time, and produces accurate 2D and 3D point tracks from monocular input.
- Score: 22.797709893040906
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We tackle the challenge of generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions, and introduce GenMOJO, a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis. While existing models perform well on novel view synthesis for isolated objects, they struggle to generalize to complex, cluttered scenes. To address this, GenMOJO decomposes the scene into individual objects, optimizing a differentiable set of deformable Gaussians per object. This object-wise decomposition allows leveraging object-centric diffusion models to infer unobserved regions in novel viewpoints. It performs joint Gaussian splatting to render the full scene, capturing cross-object occlusions, and enabling occlusion-aware supervision. To bridge the gap between object-centric priors and the global frame-centric coordinate system of videos, GenMOJO uses differentiable transformations that align generative and rendering constraints within a unified framework. The resulting model generates 4D object reconstructions over space and time, and produces accurate 2D and 3D point tracks from monocular input. Quantitative evaluations and perceptual human studies confirm that GenMOJO generates more realistic novel views of scenes and produces more accurate point tracks compared to existing approaches.
Related papers
- HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis [59.25751939710903]
We propose a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos.<n>Our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately.<n>Results highlight a practical and scalable solution for EVS in real-world scenarios.
arXiv Detail & Related papers (2025-06-24T03:54:40Z) - BulletGen: Improving 4D Reconstruction with Bullet-Time Generation [15.225127596594582]
We introduce BulletGen, an approach that takes advantage of generative models to correct errors and complete missing information in a dynamic scene representation.<n>Our method seamlessly blends generative content with both static and dynamic scene components, achieving state-of-the-art results on both novel-view synthesis, and 2D/3D tracking tasks.
arXiv Detail & Related papers (2025-06-23T13:03:42Z) - CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design [35.11283253765395]
We present a novel approach for indoor scene synthesis, which learns to arrange decomposed cuboid primitives to represent 3D objects within a scene.<n>Our approach, coined CasaGPT for Cuboid Arrangement and Scene Assembly, employs an autoregressive model to sequentially arrange cuboids, producing physically plausible scenes.
arXiv Detail & Related papers (2025-04-28T04:35:04Z) - 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [115.67081491747943]
Dynamic 3D scene representation and novel view synthesis are crucial for enabling AR/VR and metaverse applications.<n>We reformulate the reconstruction of a time-varying 3D scene as approximating its underlying 4D volume.<n>We derive several compact variants that effectively reduce the memory footprint to address its storage bottleneck.
arXiv Detail & Related papers (2024-12-30T05:30:26Z) - HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting [47.67153284714988]
We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image.<n>We also propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis.<n> Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-12-05T03:20:35Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation [10.250715657201363]
We introduce DreamMesh4D, a novel framework combining mesh representation with geometric skinning technique to generate high-quality 4D object from a monocular video.
Our method is compatible with modern graphic pipelines, showcasing its potential in the 3D gaming and film industry.
arXiv Detail & Related papers (2024-10-09T10:41:08Z) - SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance.
Our method surpasses existing methods in both quality and efficiency.
We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.