Learning Dynamic Scene Reconstruction with Sinusoidal Geometric Priors
- URL: http://arxiv.org/abs/2512.22295v1
- Date: Thu, 25 Dec 2025 20:51:19 GMT
- Title: Learning Dynamic Scene Reconstruction with Sinusoidal Geometric Priors
- Authors: Tian Guo, Hui Yuan, Philip Xu, David Elizondo,
- Abstract summary: We propose SirenPose, a novel loss function that combines the periodic activation of sinusoidal representation networks with geometric priors derived from keypoint structures to improve the accuracy of 3D scene reconstruction.
- Score: 8.153339288887922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose SirenPose, a novel loss function that combines the periodic activation properties of sinusoidal representation networks with geometric priors derived from keypoint structures to improve the accuracy of dynamic 3D scene reconstruction. Existing approaches often struggle to maintain motion modeling accuracy and spatiotemporal consistency in fast moving and multi target scenes. By introducing physics inspired constraint mechanisms, SirenPose enforces coherent keypoint predictions across both spatial and temporal dimensions. We further expand the training dataset to 600,000 annotated instances to support robust learning. Experimental results demonstrate that models trained with SirenPose achieve significant improvements in spatiotemporal consistency metrics compared to prior methods, showing superior performance in handling rapid motion and complex scene changes.
Related papers
- GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry [61.24189040578178]
We propose a fully learning-based approach that directly infers moving objects from latent feature representations via attention mechanisms.<n>Our key insight is to bypass explicit correspondence estimation and instead let the model learn to implicitly disentangle object and camera motion.<n>Our approach achieves state-of-the-art motion segmentation performance with high efficiency.
arXiv Detail & Related papers (2026-02-25T11:36:33Z) - MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments [12.796165448365949]
MOSAIC-GS is a novel, fully explicit, and computationally efficient approach for high-fidelity dynamic scene reconstruction from monocular videos.<n>We leverage multiple geometric cues, such as depth, optical flow, dynamic object segmentation, and point tracking.<n>We demonstrate that MOSAIC-GS achieves substantially faster optimization and rendering compared to existing methods.
arXiv Detail & Related papers (2026-01-08T20:48:24Z) - SirenPose: Dynamic Scene Reconstruction via Geometric Supervision [12.966077380225856]
We introduce SirenPose, a geometry-aware loss formulation that integrates the periodic activation of properties sinusoidal representation networks with keypoint-based geometric supervision.<n>In pose estimation, SirenPose outperforms Monst3R with lower absolute trajectory error as well as reduced translational and rotational relative pose error.
arXiv Detail & Related papers (2025-12-23T17:23:21Z) - PEGS: Physics-Event Enhanced Large Spatiotemporal Motion Reconstruction via 3D Gaussian Splatting [8.672740691555736]
PEGS is a framework that integrates Physical priors with Event stream enhancement within a 3D Gaussian Splatting pipeline.<n>We introduce a triple-level supervision scheme that enforces physical plausibility via an acceleration constraint.<n>We also contribute the first RGB-Event paired targeting the natural, fast rigid motion across diverse datasets.
arXiv Detail & Related papers (2025-11-21T10:27:51Z) - STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene [54.418259038624406]
existing methods adopt unified representation model (e.g. Gaussian) to directly match scene from frame camera.<n>However, this unified paradigm fails in the potential temporal features of objects due to frame features and discontinuous spatial features between background and objects.<n>In this work, we introduce event camera to compensate for frame camera, and propose a distemporal-dentangle Gaussian splatting framework for high-dynamic scene reconstruction.
arXiv Detail & Related papers (2025-06-29T09:32:06Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering [15.873329633980015]
Existing 3DGS-based methods for dynamic reconstruction often suffer from textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering)<n>We propose textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering), a plug-and-play module learns thattemporal probability distributions for each scene.
arXiv Detail & Related papers (2025-05-28T14:26:41Z) - Geometry-aware Active Learning of Spatiotemporal Dynamic Systems [4.251030047034566]
This paper proposes a geometry-aware active learning framework for modeling dynamic systems.<n>We develop an adaptive active learning strategy to strategically identify spatial locations for data collection and further maximize the prediction accuracy.
arXiv Detail & Related papers (2025-04-26T19:56:38Z) - EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation [59.33052312107478]
Event cameras offer possibilities for 3D motion estimation through continuous adaptive pixel-level responses to scene changes.<n>This paper presents EMove, a novel event-based framework that models-uniform trajectories via event-guided parametric curves.<n>For motion representation, we introduce a density-aware adaptation mechanism to fuse spatial and temporal features under event guidance.<n>The final 3D motion estimation is achieved through multi-temporal sampling of parametric trajectories, flows and depth motion fields.
arXiv Detail & Related papers (2025-03-14T13:15:54Z) - Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction [3.9363268745580426]
AT-GS is a novel method for reconstructing high-quality dynamic surfaces from multi-view videos through per-frame incremental optimization.
We reduce temporal jittering in dynamic surfaces by ensuring consistency in curvature maps across consecutive frames.
Our method achieves superior accuracy and temporal coherence in dynamic surface reconstruction, delivering high-fidelity space-time novel view synthesis.
arXiv Detail & Related papers (2024-11-10T21:30:16Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.<n>By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.<n>We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering [49.36767999382054]
We present a unified representation model, called Periodic Vibration Gaussian (PVG)<n>PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation.<n>PVG exhibits 900-fold acceleration in rendering over the best alternative.
arXiv Detail & Related papers (2023-11-30T13:53:50Z) - Triplet Attention Transformer for Spatiotemporal Predictive Learning [9.059462850026216]
We propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features.
The model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions.
arXiv Detail & Related papers (2023-10-28T12:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.