Related papers: Split4D: Decomposed 4D Scene Reconstruction Without Video Segmentation

Split4D: Decomposed 4D Scene Reconstruction Without Video Segmentation

URL: http://arxiv.org/abs/2512.22745v1
Date: Sun, 28 Dec 2025 02:37:12 GMT
Title: Split4D: Decomposed 4D Scene Reconstruction Without Video Segmentation
Authors: Yongzhen Hu, Yihui Yang, Haotong Lin, Yifan Wang, Junting Dong, Yifu Deng, Xinyu Zhu, Fan Jia, Hujun Bao, Xiaowei Zhou, Sida Peng,
Abstract summary: We represent a decomposed 4D scene with Freetime FeatureGS.<n>We design a streaming feature learning strategy to accurately recover it from per-image segmentation maps.<n> Experimental results on several datasets show that the reconstruction quality of our method outperforms recent methods by a large margin.
Score: 76.21162972133534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the problem of decomposed 4D scene reconstruction from multi-view videos. Recent methods achieve this by lifting video segmentation results to a 4D representation through differentiable rendering techniques. Therefore, they heavily rely on the quality of video segmentation maps, which are often unstable, leading to unreliable reconstruction results. To overcome this challenge, our key idea is to represent the decomposed 4D scene with the Freetime FeatureGS and design a streaming feature learning strategy to accurately recover it from per-image segmentation maps, eliminating the need for video segmentation. Freetime FeatureGS models the dynamic scene as a set of Gaussian primitives with learnable features and linear motion ability, allowing them to move to neighboring regions over time. We apply a contrastive loss to Freetime FeatureGS, forcing primitive features to be close or far apart based on whether their projections belong to the same instance in the 2D segmentation map. As our Gaussian primitives can move across time, it naturally extends the feature learning to the temporal dimension, achieving 4D segmentation. Furthermore, we sample observations for training in a temporally ordered manner, enabling the streaming propagation of features over time and effectively avoiding local minima during the optimization process. Experimental results on several datasets show that the reconstruction quality of our method outperforms recent methods by a large margin.

Related papers

Dynamic Gaussian Scene Reconstruction from Unsynchronized Videos [31.54046494140498]
Multi-view video reconstruction plays a vital role in computer vision, enabling applications in film production, virtual reality, and motion analysis.<n>We propose a novel temporal alignment strategy for high-quality 4DGS reconstruction from unsynchronized multi-view videos.
arXiv Detail & Related papers (2025-11-14T11:20:43Z)
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z)
Instant4D: 4D Gaussian Splatting in Minutes [8.897770973611427]
We present Instant4D, a monocular reconstruction system that processes casual video sequences within minutes, without calibrated cameras or depth sensors.<n>Our design significantly reduces redundancy while maintaining geometric integrity, cutting model size to under 10% of its original footprint.<n>Our method reconstructs a single video within 10 minutes on the Dycheck dataset or for a typical 200-frame video.
arXiv Detail & Related papers (2025-10-01T17:07:21Z)
4D Driving Scene Generation With Stereo Forcing [62.47705572424127]
Current generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization.<n>We present PhiGenesis, a unified framework for 4D scene generation that extends video generation techniques with geometric and temporal consistency.
arXiv Detail & Related papers (2025-09-24T15:37:17Z)
CTRL-GS: Cascaded Temporal Residue Learning for 4D Gaussian Splatting [28.308077474731594]
We propose a novel extension to 4D Gaussian Splatting for dynamic scenes.<n>We decompose the dynamic scene into a "video-segment-frame" structure, with segments dynamically adjusted by optical flow.<n>We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.
arXiv Detail & Related papers (2025-05-23T19:01:55Z)
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z)
Representing Long Volumetric Video with Temporal Gaussian Hierarchy [80.51373034419379]
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos.<n>We propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos.<n>This work is the first approach capable of efficiently handling minutes of volumetric video data while maintaining state-of-the-art rendering quality.
arXiv Detail & Related papers (2024-12-12T18:59:34Z)
Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video [55.704264233274294]
We propose Deblur4DGS to reconstruct a high-quality 4D model from blurry monocular video.<n>We transform continuous dynamic representations within an exposure time into the exposure time estimation.<n>Beyond novel-view synthesis, Deblur4DGS can be applied to improve blurry video from multiple perspectives.
arXiv Detail & Related papers (2024-12-09T12:02:11Z)
Fast Encoder-Based 3D from Casual Videos via Point Track Processing [22.563073026889324]
We present TracksTo4D, a learning-based approach that enables inferring 3D structure and camera positions from dynamic content originating from casual videos. TracksTo4D is trained in an unsupervised way on a dataset of casual videos. Experiments show that TracksTo4D can reconstruct a temporal point cloud and camera positions of the underlying video with accuracy comparable to state-of-the-art methods.
arXiv Detail & Related papers (2024-04-10T15:37:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.