Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
- URL: http://arxiv.org/abs/2507.02863v1
- Date: Thu, 03 Jul 2025 17:59:56 GMT
- Title: Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
- Authors: Yuqi Wu, Wenzhao Zheng, Jie Zhou, Jiwen Lu,
- Abstract summary: We propose Point3R, an online framework targeting dense streaming 3D reconstruction.<n>To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene.<n>Our method achieves competitive or state-of-the-art performance on various tasks with low training costs.
- Score: 72.75478398447396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense 3D scene reconstruction from an ordered sequence or unordered image collections is a critical step when bringing research in computer vision into practical scenarios. Following the paradigm introduced by DUSt3R, which unifies an image pair densely into a shared coordinate system, subsequent methods maintain an implicit memory to achieve dense 3D reconstruction from more images. However, such implicit memory is limited in capacity and may suffer from information loss of earlier frames. We propose Point3R, an online framework targeting dense streaming 3D reconstruction. To be specific, we maintain an explicit spatial pointer memory directly associated with the 3D structure of the current scene. Each pointer in this memory is assigned a specific 3D position and aggregates scene information nearby in the global coordinate system into a changing spatial feature. Information extracted from the latest frame interacts explicitly with this pointer memory, enabling dense integration of the current observation into the global coordinate system. We design a 3D hierarchical position embedding to promote this interaction and design a simple yet effective fusion mechanism to ensure that our pointer memory is uniform and efficient. Our method achieves competitive or state-of-the-art performance on various tasks with low training costs. Code is available at: https://github.com/YkiWu/Point3R.
Related papers
- MUSt3R: Multi-view Network for Stereo 3D Reconstruction [11.61182864709518]
We propose an extension of DUSt3R from pairs to multiple views, that addresses all aforementioned concerns.<n>We entail the model with a multi-layer memory mechanism which allows to reduce the computational complexity.<n>The framework is designed to perform 3D reconstruction both offline and online, and hence can be seamlessly applied to SfM and visual SLAM scenarios.
arXiv Detail & Related papers (2025-03-03T15:36:07Z) - Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - 3D Reconstruction with Spatial Memory [9.282647987510499]
We present Spann3R, a novel approach for dense 3D reconstruction from ordered or unordered image collections.
Built on the DUSt3R paradigm, Spann3R uses a transformer-based architecture to directly regress pointmaps from images without any prior knowledge of the scene or camera parameters.
arXiv Detail & Related papers (2024-08-28T18:01:00Z) - Memory-based Adapters for Online 3D Scene Perception [71.71645534899905]
Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input.
We propose an adapter-based plug-and-play module for the backbone of 3D scene perception model.
Our adapters can be easily inserted into mainstream offline architectures of different tasks and significantly boost their performance on online tasks.
arXiv Detail & Related papers (2024-03-11T17:57:41Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation [19.973034777285218]
We propose a novel fusion-aware 3D point convolution which operates directly on the geometric surface being reconstructed.
Global, we compile the online reconstructed 3D points into an incrementally growing coordinate interval tree.
We maintain the neighborhood information for each point using an octree whose construction benefits from the fast query of the global tree.
arXiv Detail & Related papers (2020-03-13T12:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.