Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction
- URL: http://arxiv.org/abs/2508.07908v2
- Date: Tue, 12 Aug 2025 11:13:49 GMT
- Title: Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction
- Authors: Xudong Cai, Shuo Wang, Peng Wang, Yongcai Wang, Zhaoxin Fan, Wanting Li, Tianbao Zhang, Jianrong Tao, Yeying Jin, Deying Li,
- Abstract summary: We propose a novel framework that decouples the modeling of static geometry and dynamic motion.<n>Mem4D simultaneously maintains static geometry with global consistency and reconstructs dynamic elements with high fidelity.
- Score: 17.587320705104343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing dense geometry for dynamic scenes from a monocular video is a critical yet challenging task. Recent memory-based methods enable efficient online reconstruction, but they fundamentally suffer from a Memory Demand Dilemma: The memory representation faces an inherent conflict between the long-term stability required for static structures and the rapid, high-fidelity detail retention needed for dynamic motion. This conflict forces existing methods into a compromise, leading to either geometric drift in static structures or blurred, inaccurate reconstructions of dynamic objects. To address this dilemma, we propose Mem4D, a novel framework that decouples the modeling of static geometry and dynamic motion. Guided by this insight, we design a dual-memory architecture: 1) The Transient Dynamics Memory (TDM) focuses on capturing high-frequency motion details from recent frames, enabling accurate and fine-grained modeling of dynamic content; 2) The Persistent Structure Memory (PSM) compresses and preserves long-term spatial information, ensuring global consistency and drift-free reconstruction for static elements. By alternating queries to these specialized memories, Mem4D simultaneously maintains static geometry with global consistency and reconstructs dynamic elements with high fidelity. Experiments on challenging benchmarks demonstrate that our method achieves state-of-the-art or competitive performance while maintaining high efficiency. Codes will be publicly available.
Related papers
- MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer [45.19539316971492]
MoRe is a feedforward 4D reconstruction network that efficiently recovers dynamic 3D scenes from monocular videos.<n>Built upon a strong static reconstruction backbone, MoRe employs an attention-forcing strategy to disentangle dynamic motion from static structure.<n>Experiments on multiple benchmarks demonstrate that MoRe achieves high-quality dynamic reconstructions with exceptional efficiency.
arXiv Detail & Related papers (2026-03-05T11:51:07Z) - LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory [97.14005794889134]
We present LoGeR, a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization.<n>LoGeR processes video streams in chunks, leveraging strong bidirectional priors for high-fidelity intra-chunk reasoning.<n>This memory architecture enables LoGeR to be trained on sequences of 128 frames, and generalize up to thousands of frames during inference.
arXiv Detail & Related papers (2026-03-03T18:55:37Z) - RELIC: Interactive Video World Model with Long-Horizon Memory [74.81433479334821]
A truly interactive world model requires real-time long-horizon streaming, consistent spatial memory, and precise user control.<n>We present RELIC, a unified framework that tackles these three challenges altogether.<n>Given a single image and a text description, RELIC enables memory-aware, long-duration exploration of arbitrary scenes in real time.
arXiv Detail & Related papers (2025-12-03T18:29:20Z) - OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects [58.38338242973447]
OnlineSplatter is a novel framework generating high-quality, object-centric 3D Gaussians directly from RGB frames.<n>Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field.<n>Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys.
arXiv Detail & Related papers (2025-10-23T14:37:25Z) - SplitGaussian: Reconstructing Dynamic Scenes via Visual Geometry Decomposition [14.381223353489062]
We propose textbfSplitGaussian, a novel framework that explicitly decomposes scene representations into static and dynamic components.<n>SplitGaussian outperforms prior state-of-the-art methods in rendering quality, geometric stability, and motion separation.
arXiv Detail & Related papers (2025-08-06T09:00:13Z) - SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction [21.822062121612166]
This work introduces SDD-4DGS, the first framework for static-dynamic decoupled 4D scene reconstruction based on Gaussian Splatting.<n>Our approach is built upon a novel probabilistic dynamic perception coefficient that is naturally integrated into the Gaussian reconstruction pipeline.<n>Experiments on five benchmark datasets demonstrate that SDD-4DGS consistently outperforms state-of-the-art methods in reconstruction fidelity.
arXiv Detail & Related papers (2025-03-12T12:25:58Z) - Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos [101.48581851337703]
We present BTimer, the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes.<n>Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames.<n>Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets.
arXiv Detail & Related papers (2024-12-04T18:15:06Z) - UrbanGS: Semantic-Guided Gaussian Splatting for Urban Scene Reconstruction [86.4386398262018]
UrbanGS uses 2D semantic maps and an existing dynamic Gaussian approach to distinguish static objects from the scene.<n>For potentially dynamic objects, we aggregate temporal information using learnable time embeddings.<n>Our approach outperforms state-of-the-art methods in reconstruction quality and efficiency.
arXiv Detail & Related papers (2024-12-04T16:59:49Z) - Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z) - CD-NGP: A Fast Scalable Continual Representation for Dynamic Scenes [31.783117836434403]
CD-NGP is a continual learning framework that reduces memory overhead and enhances scalability.<n>It significantly reduces training memory usage to 14GB and requires only 0.4MB/frame in streaming bandwidth on DyNeRF.
arXiv Detail & Related papers (2024-09-08T17:35:48Z) - Class-agnostic Reconstruction of Dynamic Objects from Videos [127.41336060616214]
We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos.
We develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues.
Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation.
arXiv Detail & Related papers (2021-12-03T18:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.