4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere
- URL: http://arxiv.org/abs/2602.10094v1
- Date: Tue, 10 Feb 2026 18:57:04 GMT
- Title: 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere
- Authors: Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy,
- Abstract summary: We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos.<n>4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics.
- Score: 77.83037497484366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.
Related papers
- Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis [53.48281548500864]
Motion 3-to-4 is a feed-forward framework for synthesising high-quality 4D dynamic objects from a single monocular video.<n>Our model learns a compact motion latent representation and predicts per-frame trajectories to recover complete robustness, temporally coherent geometry.
arXiv Detail & Related papers (2026-01-20T18:59:48Z) - Any4D: Unified Feed-Forward Metric 4D Reconstruction [39.62006179006032]
We present Any4D, a scalable multi-view transformer for metric-scale, dense feed-forward 4D reconstruction.<n>Any4D directly generates per-pixel motion and geometry predictions for N frames.<n>We achieve superior performance across diverse setups - both in terms of accuracy (2-3X lower error) and compute efficiency (15X faster)
arXiv Detail & Related papers (2025-12-11T18:57:39Z) - Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image [88.71287865590273]
We introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories.<n>We propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D trajectories.<n>We then propose a 4D View Synthesis Module (4D-Vi) to render videos with arbitrary camera trajectories from 4D point track representations.
arXiv Detail & Related papers (2025-12-04T17:59:10Z) - 4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time [74.07107064085409]
4D-LRM is the first large-scale 4D reconstruction model that takes input from unconstrained views and timestamps and renders arbitrary view-time combinations.<n>It learns a unified space-time representation and directly predicts per-pixel 4D Gaussian primitives from posed image tokens across time.<n>It reconstructs 24-frame sequences in one forward pass with less than 1.5 seconds on a single A100 GPU.
arXiv Detail & Related papers (2025-06-23T17:57:47Z) - Can Video Diffusion Model Reconstruct 4D Geometry? [66.5454886982702]
Sora3R is a novel framework that taps into richtemporals of large dynamic video diffusion models to infer 4D pointmaps from casual videos.<n>Experiments demonstrate that Sora3R reliably recovers both camera poses and detailed scene geometry, achieving performance on par with state-of-the-art methods for dynamic 4D reconstruction.
arXiv Detail & Related papers (2025-03-27T01:44:46Z) - Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.