Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
- URL: http://arxiv.org/abs/2503.16318v1
- Date: Thu, 20 Mar 2025 16:41:50 GMT
- Title: Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
- Authors: Edgar Sucar, Zihang Lai, Eldar Insafutdinov, Andrea Vedaldi,
- Abstract summary: We introduce Dynamic Point Maps (DPM), extending standard point maps to support 4D tasks such as motion segmentation, scene flow estimation, 3D object tracking, and 2D correspondence.<n>We train a DPM predictor on a mixture of synthetic and real data and evaluate it across diverse benchmarks for video depth prediction, dynamic point cloud reconstruction, 3D scene flow and object pose tracking, achieving state-of-the-art performance.
- Score: 56.32589034046427
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: DUSt3R has recently shown that one can reduce many tasks in multi-view geometry, including estimating camera intrinsics and extrinsics, reconstructing the scene in 3D, and establishing image correspondences, to the prediction of a pair of viewpoint-invariant point maps, i.e., pixel-aligned point clouds defined in a common reference frame. This formulation is elegant and powerful, but unable to tackle dynamic scenes. To address this challenge, we introduce the concept of Dynamic Point Maps (DPM), extending standard point maps to support 4D tasks such as motion segmentation, scene flow estimation, 3D object tracking, and 2D correspondence. Our key intuition is that, when time is introduced, there are several possible spatial and time references that can be used to define the point maps. We identify a minimal subset of such combinations that can be regressed by a network to solve the sub tasks mentioned above. We train a DPM predictor on a mixture of synthetic and real data and evaluate it across diverse benchmarks for video depth prediction, dynamic point cloud reconstruction, 3D scene flow and object pose tracking, achieving state-of-the-art performance. Code, models and additional results are available at https://www.robots.ox.ac.uk/~vgg/research/dynamic-point-maps/.
Related papers
- POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction [53.19968902152528]
We present POMATO, a unified framework for dynamic 3D reconstruction by marrying pointmap matching with temporal motion.
Specifically, our method learns an explicit matching relationship by mapping RGB pixels from both dynamic and static regions across different views to 3D pointmaps.
We show the effectiveness of the proposed pointmap matching and temporal fusion paradigm by demonstrating the remarkable performance across multiple downstream tasks.
arXiv Detail & Related papers (2025-04-08T05:33:13Z) - VGGT: Visual Geometry Grounded Transformer [61.37669770946458]
VGGT is a feed-forward neural network that directly infers all key 3D attributes of a scene.<n>Network achieves state-of-the-art results in multiple 3D tasks.
arXiv Detail & Related papers (2025-03-14T17:59:47Z) - DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction [67.13370009386635]
We introduce the Dual Point Maps (DualPM), where a pair of point maps is extracted from the same image, one associating pixels to their 3D locations on the object, and the other to a canonical version of the object at rest pose.<n>We show that 3D reconstruction and 3D pose estimation reduce to the prediction of the DualPMs.
arXiv Detail & Related papers (2024-12-05T18:59:48Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors [25.393382192511716]
We extend a multi-view 3D semantic mapping system consisting of a network of distributed edge sensors with object-level information.
Our method is evaluated on the public Behave dataset where it shows pose estimation within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment.
arXiv Detail & Related papers (2022-11-21T11:13:08Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping [23.456046776979903]
We propose to leverage multiview data of textitstatic points in arbitrary scenes (static or dynamic) to learn a neural 3D mapping module.
The neural 3D mapper consumes RGB-D data as input, and produces a 3D voxel grid of deep features as output.
We show that our unsupervised 3D object trackers outperform prior unsupervised 2D and 2.5D trackers, and approach the accuracy of supervised trackers.
arXiv Detail & Related papers (2020-08-04T02:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.