Related papers: Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments

Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments

URL: http://arxiv.org/abs/2510.23928v1
Date: Mon, 27 Oct 2025 23:25:57 GMT
Title: Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments
Authors: Raman Jha, Yang Zhou, Giuseppe Loianno,
Abstract summary: We propose an adaptive selection method for improved 3D scene reconstruction in dynamic environments.<n>The proposed method integrates two complementary modules: an error-based selection module and a momentum-based update module.<n>We evaluate our proposed adaptive selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R.
Score: 10.967576917866408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.

Related papers

Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding [54.859943475818234]
We present Motion4D, a novel framework that integrates 2D priors from foundation models into a unified 4D Gaussian Splatting representation.<n>Our method features a two-part iterative optimization framework: 1) Sequential optimization, which updates motion and semantic fields in consecutive stages to maintain local consistency, and 2) Global optimization, which jointly refines all attributes for long-term coherence.<n>Our method significantly outperforms both 2D foundation models and existing 3D-based approaches across diverse scene understanding tasks, including point-based tracking, video object segmentation, and novel view synthesis.
arXiv Detail & Related papers (2025-12-03T09:32:56Z)
4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos [52.89084603734664]
We present 4D3R, a pose-free dynamic neural rendering framework that decouples static and dynamic components through a two-stage approach.<n>Our approach achieves up to 1.8dB PSNR improvement over state-of-the-art methods.
arXiv Detail & Related papers (2025-11-07T13:25:50Z)
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning [65.42201665046505]
Current video understanding models rely on fixed frame sampling strategies, processing predetermined visual inputs regardless of the specific reasoning requirements of each question.<n>This static approach limits their ability to adaptively gather visual evidence, leading to suboptimal performance on tasks that require broad temporal coverage or fine-grained spatial detail.<n>We introduce FrameMind, an end-to-end framework trained with reinforcement learning that enables models to dynamically request visual information during reasoning through Frame-Interleaved Chain-of-Thought (FiCOT)<n>Unlike traditional approaches, FrameMind operates in multiple turns where the model alternates between textual reasoning and active visual perception, using tools to extract
arXiv Detail & Related papers (2025-09-28T17:59:43Z)
Puppeteer: Rig and Animate Your 3D Models [105.11046762553121]
Puppeteer is a comprehensive framework that addresses both automatic rigging and animation for diverse 3D objects.<n>Our system first predicts plausible skeletal structures via an auto-regressive transformer.<n>It then infers skinning weights via an attention-based architecture.
arXiv Detail & Related papers (2025-08-14T17:59:31Z)
Laplacian Analysis Meets Dynamics Modelling: Gaussian Splatting for 4D Reconstruction [9.911802466255653]
We propose a novel dynamic 3DGS framework with hybrid explicit-implicit functions.<n>Our method demonstrates state-of-the-art performance in reconstructing complex dynamic scenes, achieving better reconstruction fidelity.
arXiv Detail & Related papers (2025-08-07T01:39:29Z)
SD-GS: Structured Deformable 3D Gaussians for Efficient Dynamic Scene Reconstruction [5.818188539758898]
We present SD-GS, a compact and efficient dynamic splatting framework for complex dynamic scene reconstruction.<n>We also present a deformation-aware densification strategy that adaptively grows anchors in under-reconstructed high-dynamic regions.<n> Experimental results demonstrate that SD-GS achieves an average of 60% reduction in model size and an average of 100% improvement in FPS.
arXiv Detail & Related papers (2025-07-10T06:35:03Z)
Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion [43.59385149982744]
Single-frame pose estimation has seen significant progress, but it often fails to capture the temporal dynamics for understanding complex, continuous movements.<n>We propose Poseidon, a novel multi-frame pose estimation architecture that extends the ViTPose model by integrating temporal information.<n>Our approach achieves state-of-the-art performance on the PoseTrack21 and PoseTrack18 datasets, achieving mAP scores of 88.3 and 87.8, respectively.
arXiv Detail & Related papers (2025-01-14T21:34:34Z)
Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z)
Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction [89.53963284958037]
We propose a novel motion-aware enhancement framework for dynamic scene reconstruction. Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. For the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed.
arXiv Detail & Related papers (2024-03-18T03:46:26Z)
Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching [17.344430840048094]
Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies. We develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation. Unlike the existing methods, we model this task as local matching and global aggregation.
arXiv Detail & Related papers (2024-03-16T01:38:28Z)
NVFi: Neural Velocity Fields for 3D Physics Learning from Dynamic Videos [8.559809421797784]
We propose to simultaneously learn the geometry, appearance, and physical velocity of 3D scenes only from video frames. We conduct extensive experiments on multiple datasets, demonstrating the superior performance of our method over all baselines.
arXiv Detail & Related papers (2023-12-11T14:07:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.