Related papers: HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis

HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis

URL: http://arxiv.org/abs/2506.19291v1
Date: Tue, 24 Jun 2025 03:54:40 GMT
Title: HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis
Authors: Xiaoyuan Wang, Yizhou Zhao, Botao Ye, Xiaojun Shan, Weijie Lyu, Lu Qi, Kelvin C. K. Chan, Yinxiao Li, Ming-Hsuan Yang,
Abstract summary: We propose a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos.<n>Our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately.<n>Results highlight a practical and scalable solution for EVS in real-world scenarios.
Score: 59.25751939710903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose HoliGS, a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos. Unlike prior 4D Gaussian splatting and dynamic NeRF pipelines, which struggle with training overhead in minute-long captures, our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately. Specifically, we decompose each scene into a static background plus time-varying objects, each represented by learned Gaussian primitives undergoing global rigid transformations, skeleton-driven articulation, and subtle non-rigid deformations via an invertible neural flow. This hierarchical warping strategy enables robust free-viewpoint novel-view rendering from various embodied camera trajectories by attaching Gaussians to a complete canonical foreground shape (\eg, egocentric or third-person follow), which may involve substantial viewpoint changes and interactions between multiple actors. Our experiments demonstrate that \ourmethod~ achieves superior reconstruction quality on challenging datasets while significantly reducing both training and rendering time compared to state-of-the-art monocular deformable NeRFs. These results highlight a practical and scalable solution for EVS in real-world scenarios. The source code will be released.

Related papers

A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory [8.972911362220803]
We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale scenes on a single consumer-grade GPU.<n>A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection.<n>A lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering.
arXiv Detail & Related papers (2025-07-01T18:12:43Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images [39.03889696169877]
RoGSplat is a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images.<n>Our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization.
arXiv Detail & Related papers (2025-03-18T12:18:34Z)
MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction [84.07233691641193]
We introduce MonoGSDF, a novel method that couples primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction.<n>To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization.<n>Experiments on real-world datasets outperforms prior methods while maintaining efficiency.
arXiv Detail & Related papers (2024-11-25T20:07:07Z)
NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images. We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z)
WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections [8.261637198675151]
Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. We propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.
arXiv Detail & Related papers (2024-06-04T15:17:37Z)
FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes [50.534213038479926]
FreeSplat is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis. We propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views.
arXiv Detail & Related papers (2024-05-28T08:40:14Z)
Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections [12.807052947367692]
Photometric variation and transient occluders in unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF) Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene.
arXiv Detail & Related papers (2024-03-23T03:55:41Z)
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes [59.23385953161328]
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. We propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians. Our method can enable user-controlled motion editing while retaining high-fidelity appearances.
arXiv Detail & Related papers (2023-12-04T11:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.