SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting
- URL: http://arxiv.org/abs/2601.00285v1
- Date: Thu, 01 Jan 2026 09:53:03 GMT
- Title: SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting
- Authors: Jun-Jee Chao, Volkan Isler,
- Abstract summary: We present SV-GS, a framework that simultaneously estimates a deformation model and the object's motion over time under sparse observations.<n>Our method outperforms existing approaches under sparse observations by up to 34% in PSNR.
- Score: 19.12278036176021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing a dynamic target moving over a large area is challenging. Standard approaches for dynamic object reconstruction require dense coverage in both the viewing space and the temporal dimension, typically relying on multi-view videos captured at each time step. However, such setups are only possible in constrained environments. In real-world scenarios, observations are often sparse over time and captured sparsely from diverse viewpoints (e.g., from security cameras), making dynamic reconstruction highly ill-posed. We present SV-GS, a framework that simultaneously estimates a deformation model and the object's motion over time under sparse observations. To initialize SV-GS, we leverage a rough skeleton graph and an initial static reconstruction as inputs to guide motion estimation. (Later, we show that this input requirement can be relaxed.) Our method optimizes a skeleton-driven deformation field composed of a coarse skeleton joint pose estimator and a module for fine-grained deformations. By making only the joint pose estimator time-dependent, our model enables smooth motion interpolation while preserving learned geometric details. Experiments on synthetic datasets show that our method outperforms existing approaches under sparse observations by up to 34% in PSNR, and achieves comparable performance to dense monocular video methods on real-world datasets despite using significantly fewer frames. Moreover, we demonstrate that the input initial static reconstruction can be replaced by a diffusion-based generative prior, making our method more practical for real-world scenarios.
Related papers
- MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments [12.796165448365949]
MOSAIC-GS is a novel, fully explicit, and computationally efficient approach for high-fidelity dynamic scene reconstruction from monocular videos.<n>We leverage multiple geometric cues, such as depth, optical flow, dynamic object segmentation, and point tracking.<n>We demonstrate that MOSAIC-GS achieves substantially faster optimization and rendering compared to existing methods.
arXiv Detail & Related papers (2026-01-08T20:48:24Z) - IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes [25.939318593012484]
Reconstructing dynamic driving scenes is essential for developing autonomous systems through sensor-realistic simulation.<n>We present IDSplat, a self-supervised 3D Gaussian Splatting framework that reconstructs dynamic scenes with explicit instance decomposition and learnable motion trajectories.<n>Our method achieves competitive reconstruction quality while maintaining instance-level decomposition and generalizes across diverse sequences and view densities without retraining.
arXiv Detail & Related papers (2025-11-24T15:48:08Z) - GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - Learning Multi-frame and Monocular Prior for Estimating Geometry in Dynamic Scenes [56.936178608296906]
We present a new model, coined MMP, to estimate the geometry in a feed-forward manner.<n>Based on the recent Siamese architecture, we introduce a new trajectory encoding module.<n>We find MMP can achieve state-of-the-art quality in feed-forward pointmap prediction.
arXiv Detail & Related papers (2025-05-03T08:28:15Z) - Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.<n>By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.<n>We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video [13.510513575340106]
Reconstructing dynamic articulated objects from a singular monocular video is challenging, requiring joint estimation of shape, motion, and camera parameters from limited views.
We propose Synergistic Shape and Skeleton Optimization (S3O), a novel two-phase method that efficiently learns parametric models including visible shapes and underlying skeletons.
Our experimental evaluations on standard benchmarks and the PlanetZoo dataset affirm that S3O provides more accurate 3D reconstruction, and plausible skeletons, and reduces the training time by approximately 60% compared to the state-of-the-art.
arXiv Detail & Related papers (2024-05-21T09:01:00Z) - SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes [75.9110646062442]
We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner.
Our method takes multi-view RGB videos and background images from static cameras with known camera parameters as input.
We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.
arXiv Detail & Related papers (2023-08-16T09:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.