Related papers: SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

URL: http://arxiv.org/abs/2512.04315v1
Date: Wed, 03 Dec 2025 23:05:01 GMT
Title: SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting
Authors: Yonghan Lee, Tsung-Wei Huang, Shiv Gehlot, Jaehoon Choi, Guan-Ming Su, Dinesh Manocha,
Abstract summary: We present a novel multi-video 4D Gaussian Splatting (4DGS) approach designed to handle real-world, unsynchronized video sets.<n>Our approach, SyncTrack4D, directly leverages dense 4D track representation of dynamic scene parts as cues for simultaneous cross-video synchronization and 4DGS reconstruction.<n>We evaluate our approach on the Panoptic Studio and SyncNeRF Blender, demonstrating sub-frame synchronization accuracy with an average temporal error below 0.26 frames, and high-fidelity 4D reconstruction reaching 26.3 PSNR scores.
Score: 50.69165364520998
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion. We present a novel multi-video 4D Gaussian Splatting (4DGS) approach designed to handle real-world, unsynchronized video sets. Our approach, SyncTrack4D, directly leverages dense 4D track representation of dynamic scene parts as cues for simultaneous cross-video synchronization and 4DGS reconstruction. We first compute dense per-video 4D feature tracks and cross-video track correspondences by Fused Gromov-Wasserstein optimal transport approach. Next, we perform global frame-level temporal alignment to maximize overlapping motion of matched 4D tracks. Finally, we achieve sub-frame synchronization through our multi-video 4D Gaussian splatting built upon a motion-spline scaffold representation. The final output is a synchronized 4DGS representation with dense, explicit 3D trajectories, and temporal offsets for each video. We evaluate our approach on the Panoptic Studio and SyncNeRF Blender, demonstrating sub-frame synchronization accuracy with an average temporal error below 0.26 frames, and high-fidelity 4D reconstruction reaching 26.3 PSNR scores on the Panoptic Studio dataset. To the best of our knowledge, our work is the first general 4D Gaussian Splatting approach for unsynchronized video sets, without assuming the existence of predefined scene objects or prior models.

Related papers

Tracking-Guided 4D Generation: Foundation-Tracker Motion Priors for 3D Model Animation [21.075786141331974]
We present emphTrack4DGen, a framework for generating dynamic 4D objects from sparse inputs.<n>In Stage One, we enforce dense, feature-level point correspondences inside the diffusion generator.<n>In Stage Two, we reconstruct a dynamic 4D-GS using a hybrid motion encoding.
arXiv Detail & Related papers (2025-12-05T21:13:04Z)
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image [88.71287865590273]
We introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories.<n>We propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D trajectories.<n>We then propose a 4D View Synthesis Module (4D-Vi) to render videos with arbitrary camera trajectories from 4D point track representations.
arXiv Detail & Related papers (2025-12-04T17:59:10Z)
SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis [47.61773799705708]
We introduce SyncMV4D, the first model that jointly generates synchronized multi-view HOI videos and 4D motions.<n>Our method demonstrates superior performance to state-of-the-art alternatives in visual realism, motion plausibility, and multi-view consistency.
arXiv Detail & Related papers (2025-11-24T17:14:19Z)
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z)
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos [85.45517487721257]
We introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video.<n>Our method accurately captures non-rigid motion, volume changes, and even topological transitions without per-frame optimization.
arXiv Detail & Related papers (2025-10-07T17:58:11Z)
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation [63.68181731564576]
We propose a new problem, Inbetween-2-4D, for generative 4D (i.e., 3D + motion) in interpolate two single-view images.<n>In contrast to video/4D generation from only text or a single image, our interpolative task can leverage more precise motion control to better constrain the generation.
arXiv Detail & Related papers (2025-04-11T09:01:09Z)
Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video [55.704264233274294]
We propose Deblur4DGS to reconstruct a high-quality 4D model from blurry monocular video.<n>We transform continuous dynamic representations within an exposure time into the exposure time estimation.<n>Beyond novel-view synthesis, Deblur4DGS can be applied to improve blurry video from multiple perspectives.
arXiv Detail & Related papers (2024-12-09T12:02:11Z)
Consistent4D: Consistent 360{\deg} Dynamic Object Generation from Monocular Video [15.621374353364468]
Consistent4D is a novel approach for generating 4D dynamic objects from uncalibrated monocular videos. We cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration.
arXiv Detail & Related papers (2023-11-06T03:26:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.