Instant4D: 4D Gaussian Splatting in Minutes
- URL: http://arxiv.org/abs/2510.01119v1
- Date: Wed, 01 Oct 2025 17:07:21 GMT
- Title: Instant4D: 4D Gaussian Splatting in Minutes
- Authors: Zhanpeng Luo, Haoxi Ran, Li Lu,
- Abstract summary: We present Instant4D, a monocular reconstruction system that processes casual video sequences within minutes, without calibrated cameras or depth sensors.<n>Our design significantly reduces redundancy while maintaining geometric integrity, cutting model size to under 10% of its original footprint.<n>Our method reconstructs a single video within 10 minutes on the Dycheck dataset or for a typical 200-frame video.
- Score: 8.897770973611427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic view synthesis has seen significant advances, yet reconstructing scenes from uncalibrated, casual video remains challenging due to slow optimization and complex parameter estimation. In this work, we present Instant4D, a monocular reconstruction system that leverages native 4D representation to efficiently process casual video sequences within minutes, without calibrated cameras or depth sensors. Our method begins with geometric recovery through deep visual SLAM, followed by grid pruning to optimize scene representation. Our design significantly reduces redundancy while maintaining geometric integrity, cutting model size to under 10% of its original footprint. To handle temporal dynamics efficiently, we introduce a streamlined 4D Gaussian representation, achieving a 30x speed-up and reducing training time to within two minutes, while maintaining competitive performance across several benchmarks. Our method reconstruct a single video within 10 minutes on the Dycheck dataset or for a typical 200-frame video. We further apply our model to in-the-wild videos, showcasing its generalizability. Our project website is published at https://instant4d.github.io/.
Related papers
- Split4D: Decomposed 4D Scene Reconstruction Without Video Segmentation [76.21162972133534]
We represent a decomposed 4D scene with Freetime FeatureGS.<n>We design a streaming feature learning strategy to accurately recover it from per-image segmentation maps.<n> Experimental results on several datasets show that the reconstruction quality of our method outperforms recent methods by a large margin.
arXiv Detail & Related papers (2025-12-28T02:37:12Z) - Efficiently Reconstructing Dynamic Scenes One D4RT at a Time [54.67332582569525]
This paper introduces D4RT, a simple yet powerful feedforward model designed to efficiently solve this task.<n>Our decoding interface allows the model to independently and flexibly probe the 3D position of any point in space and time.<n>We demonstrate that our approach sets a new state of the art, outperforming previous methods across a wide spectrum of 4D reconstruction tasks.
arXiv Detail & Related papers (2025-12-09T18:57:21Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - 4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time [74.07107064085409]
4D-LRM is the first large-scale 4D reconstruction model that takes input from unconstrained views and timestamps and renders arbitrary view-time combinations.<n>It learns a unified space-time representation and directly predicts per-pixel 4D Gaussian primitives from posed image tokens across time.<n>It reconstructs 24-frame sequences in one forward pass with less than 1.5 seconds on a single A100 GPU.
arXiv Detail & Related papers (2025-06-23T17:57:47Z) - 4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos [29.061337554486897]
We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction.<n>Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components.<n>Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene.
arXiv Detail & Related papers (2025-06-09T17:59:59Z) - Representing Long Volumetric Video with Temporal Gaussian Hierarchy [80.51373034419379]
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos.<n>We propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos.<n>This work is the first approach capable of efficiently handling minutes of volumetric video data while maintaining state-of-the-art rendering quality.
arXiv Detail & Related papers (2024-12-12T18:59:34Z) - Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting [14.759265492381509]
We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters.<n>It includes the extraction of 2D point features that robustly represent 3D structure.<n>Results show significant improvements over state-of-the-art methods for 4D novel view synthesis.
arXiv Detail & Related papers (2024-06-03T06:52:35Z) - Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video [42.10482273572879]
We propose an efficient video-to-4D object generation framework called Efficient4D.<n>It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data.<n>Experiments on both synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed.
arXiv Detail & Related papers (2024-01-16T18:58:36Z) - Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering [103.32717396287751]
We propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes.
A neuralvoxel encoding algorithm inspired by HexPlane is proposed to efficiently build features from 4D neural voxels.
Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$times$800 resolution on an 3090 GPU.
arXiv Detail & Related papers (2023-10-12T17:21:41Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.