Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos
- URL: http://arxiv.org/abs/2509.24209v1
- Date: Mon, 29 Sep 2025 02:47:14 GMT
- Title: Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos
- Authors: Yingdong Hu, Yisheng He, Jinnan Chen, Weihao Yuan, Kejie Qiu, Zehong Lin, Siyu Zhu, Zilong Dong, Jun Zhang,
- Abstract summary: We present a feed-forward 4D human reconstruction and model that efficientlycalibrates temporally aligned representations from uncalibrated sparse-view videos.<n>For novel time, we design a novel motion prediction module to predict dense motions for each 3D Gaussian between two adjacent frames.<n>Experiments demonstrate the effectiveness of our model on both in-domain and out-of-domain datasets.
- Score: 27.595035122927204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instant reconstruction of dynamic 3D humans from uncalibrated sparse-view videos is critical for numerous downstream applications. Existing methods, however, are either limited by the slow reconstruction speeds or incapable of generating novel-time representations. To address these challenges, we propose Forge4D, a feed-forward 4D human reconstruction and interpolation model that efficiently reconstructs temporally aligned representations from uncalibrated sparse-view videos, enabling both novel view and novel time synthesis. Our model simplifies the 4D reconstruction and interpolation problem as a joint task of streaming 3D Gaussian reconstruction and dense motion prediction. For the task of streaming 3D Gaussian reconstruction, we first reconstruct static 3D Gaussians from uncalibrated sparse-view images and then introduce learnable state tokens to enforce temporal consistency in a memory-friendly manner by interactively updating shared information across different timestamps. For novel time synthesis, we design a novel motion prediction module to predict dense motions for each 3D Gaussian between two adjacent frames, coupled with an occlusion-aware Gaussian fusion process to interpolate 3D Gaussians at arbitrary timestamps. To overcome the lack of the ground truth for dense motion supervision, we formulate dense motion prediction as a dense point matching task and introduce a self-supervised retargeting loss to optimize this module. An additional occlusion-aware optical flow loss is introduced to ensure motion consistency with plausible human movement, providing stronger regularization. Extensive experiments demonstrate the effectiveness of our model on both in-domain and out-of-domain datasets. Project page and code at: https://zhenliuzju.github.io/huyingdong/Forge4D.
Related papers
- MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification [10.799902862870288]
MoRel is a novel framework for temporally consistent and memory-efficient modeling of long-range dynamic scenes.<n>Our approach mitigates temporal discontinuities and flickering artifacts.<n>It achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage.
arXiv Detail & Related papers (2025-12-10T02:49:09Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z) - Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models [83.76517697509156]
This paper addresses the challenge of high-fidelity view synthesis of humans with sparse-view videos as input.<n>We propose a novel iterative sliding denoising process to enhance view-temporal consistency of the 4D diffusion model.<n>Our method is able to synthesize high-quality and consistent novel-view videos and significantly outperforms the existing approaches.
arXiv Detail & Related papers (2025-07-17T17:59:17Z) - FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction [64.30050475414947]
FreeTimeGS is a novel 4D representation that allows Gaussian primitives to appear at arbitrary time and locations.<n>Our representation possesses the strong flexibility, thus improving the ability to model dynamic 3D scenes.<n> Experiments results on several datasets show that the rendering quality of our method outperforms recent methods by a large margin.
arXiv Detail & Related papers (2025-06-05T17:59:57Z) - S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points [30.46796069720543]
We introduce a novel approach for streaming 4D real-world reconstruction utilizing discrete 3D control points.
This method physically models local rays and establishes a motion-decoupling coordinate system.
By effectively merging traditional graphics with learnable pipelines, it provides a robust and efficient local 6-degrees-of-freedom (6 DoF) motion representation.
arXiv Detail & Related papers (2024-08-23T12:51:49Z) - RoHM: Robust Human Motion Reconstruction via Diffusion [58.63706638272891]
RoHM is an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos.
It conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates.
Our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time.
arXiv Detail & Related papers (2024-01-16T18:57:50Z) - Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking [52.393359791978035]
Motion2VecSets is a 4D diffusion model for dynamic surface reconstruction from point cloud sequences.
We parameterize 4D dynamics with latent sets instead of using global latent codes.
For more temporally-coherent object tracking, we synchronously denoise deformation latent sets and exchange information across multiple frames.
arXiv Detail & Related papers (2024-01-12T15:05:08Z) - Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle [9.082693946898733]
We introduce a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos.
In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS)
Our proposed approach showcases a substantial efficiency improvement, achieving a $5times$ faster training speed compared to the per-frame 3DGS modeling.
arXiv Detail & Related papers (2023-12-06T11:25:52Z) - Consistent4D: Consistent 360{\deg} Dynamic Object Generation from
Monocular Video [15.621374353364468]
Consistent4D is a novel approach for generating 4D dynamic objects from uncalibrated monocular videos.
We cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration.
arXiv Detail & Related papers (2023-11-06T03:26:43Z) - OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic
3D Reconstruction [14.130915525776055]
RGBD-based real-time dynamic 3D reconstruction suffers from inaccurate inter-frame motion estimation.
We propose OcclusionFusion, a novel method to calculate occlusion-aware 3D motion to guide the reconstruction.
Our technique outperforms existing single-view-based real-time methods by a large margin.
arXiv Detail & Related papers (2022-03-15T15:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.