4D Primitive-Mâché: Glueing Primitives for Persistent 4D Scene Reconstruction
- URL: http://arxiv.org/abs/2512.16564v1
- Date: Thu, 18 Dec 2025 14:06:15 GMT
- Title: 4D Primitive-Mâché: Glueing Primitives for Persistent 4D Scene Reconstruction
- Authors: Kirill Mazur, Marwan Taher, Andrew J. Davison,
- Abstract summary: We present a dynamic reconstruction system that receives a casual monocular RGB video as input, and outputs a persistent reconstruction of the scene.<n>In other words, we reconstruct not only the the currently visible parts of the scene, but also all previously viewed parts, which enables replaying the complete reconstruction across all timesteps.
- Score: 28.50411933478524
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present a dynamic reconstruction system that receives a casual monocular RGB video as input, and outputs a complete and persistent reconstruction of the scene. In other words, we reconstruct not only the the currently visible parts of the scene, but also all previously viewed parts, which enables replaying the complete reconstruction across all timesteps. Our method decomposes the scene into a set of rigid 3D primitives, which are assumed to be moving throughout the scene. Using estimated dense 2D correspondences, we jointly infer the rigid motion of these primitives through an optimisation pipeline, yielding a 4D reconstruction of the scene, i.e. providing 3D geometry dynamically moving through time. To achieve this, we also introduce a mechanism to extrapolate motion for objects that become invisible, employing motion-grouping techniques to maintain continuity. The resulting system enables 4D spatio-temporal awareness, offering capabilities such as replayable 3D reconstructions of articulated objects through time, multi-object scanning, and object permanence. On object scanning and multi-object datasets, our system significantly outperforms existing methods both quantitatively and qualitatively.
Related papers
- MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer [45.19539316971492]
MoRe is a feedforward 4D reconstruction network that efficiently recovers dynamic 3D scenes from monocular videos.<n>Built upon a strong static reconstruction backbone, MoRe employs an attention-forcing strategy to disentangle dynamic motion from static structure.<n>Experiments on multiple benchmarks demonstrate that MoRe achieves high-quality dynamic reconstructions with exceptional efficiency.
arXiv Detail & Related papers (2026-03-05T11:51:07Z) - 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere [77.83037497484366]
We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos.<n>4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics.
arXiv Detail & Related papers (2026-02-10T18:57:04Z) - Inferring Compositional 4D Scenes without Ever Seeing One [58.81854043690171]
We propose a method that consistently predicts structure and-temporal configuration of 4D/3D objects.<n>We achieve this by a carefully designed training of several spatial and temporal attentions on 2D video input.<n>By alternating between spatial and temporal reasoning, COM4D reconstructs complete and composed scenes.
arXiv Detail & Related papers (2025-12-04T21:51:47Z) - C4D: 4D Made from 3D through Dual Correspondences [77.04731692213663]
We introduce C4D, a framework that leverages temporal correspondences to extend existing 3D reconstruction formulation to 4D.<n>C4D captures two types of correspondences: short-term optical flow and long-term point tracking.<n>We train a dynamic-aware point tracker that provides additional mobility information.
arXiv Detail & Related papers (2025-10-16T17:59:06Z) - D$^2$USt3R: Enhancing 3D Reconstruction for Dynamic Scenes [54.886845755635754]
This work addresses the task of 3D reconstruction in dynamic scenes, where object motions frequently degrade the quality of previous 3D pointmap regression methods.<n>By explicitly incorporating both spatial and temporal aspects, our approach successfully encapsulates 3D dense correspondence to the proposed pointmaps.
arXiv Detail & Related papers (2025-04-08T17:59:50Z) - WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes [65.76371201992654]
We propose a novel 4D reconstruction benchmark, WideRange4D.<n>This benchmark includes rich 4D scene data with large spatial variations, allowing for a more comprehensive evaluation of the generation capabilities of 4D generation methods.<n>We also introduce a new 4D reconstruction method, Progress4D, which generates stable and high-quality 4D results across various complex 4D scene reconstruction tasks.
arXiv Detail & Related papers (2025-03-17T17:58:18Z) - 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [115.67081491747943]
Dynamic 3D scene representation and novel view synthesis are crucial for enabling AR/VR and metaverse applications.<n>We reformulate the reconstruction of a time-varying 3D scene as approximating its underlying 4D volume.<n>We derive several compact variants that effectively reduce the memory footprint to address its storage bottleneck.
arXiv Detail & Related papers (2024-12-30T05:30:26Z) - T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.<n>We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z) - Shape of Motion: 4D Reconstruction from a Single Video [42.42669078777769]
We introduce a method for reconstructing generic dynamic scenes, featuring explicit, persistent 3D motion trajectories in the world coordinate frame.<n>First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE(3) motion bases.<n>Second, we take advantage of off-the-shelf data-driven priors such as monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - Real-time Photorealistic Dynamic Scene Representation and Rendering with
4D Gaussian Splatting [8.078460597825142]
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics.
We propose to approximate the underlying-temporal rendering volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling.
Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics.
arXiv Detail & Related papers (2023-10-16T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.