St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
- URL: http://arxiv.org/abs/2504.13152v1
- Date: Thu, 17 Apr 2025 17:55:58 GMT
- Title: St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
- Authors: Haiwen Feng, Junyi Zhang, Qianqian Wang, Yufei Ye, Pengcheng Yu, Michael J. Black, Trevor Darrell, Angjoo Kanazawa,
- Abstract summary: St4RTrack is a framework that simultaneously reconstructs and tracks dynamic video content in a world coordinate frame from RGB inputs.<n>We predict both pointmaps at the same moment, in the same world, capturing both static and dynamic scene geometry.<n>We establish a new extensive benchmark for world-frame reconstruction and tracking, demonstrating the effectiveness and efficiency of our unified, data-driven framework.
- Score: 106.91539872943864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic 3D reconstruction and point tracking in videos are typically treated as separate tasks, despite their deep connection. We propose St4RTrack, a feed-forward framework that simultaneously reconstructs and tracks dynamic video content in a world coordinate frame from RGB inputs. This is achieved by predicting two appropriately defined pointmaps for a pair of frames captured at different moments. Specifically, we predict both pointmaps at the same moment, in the same world, capturing both static and dynamic scene geometry while maintaining 3D correspondences. Chaining these predictions through the video sequence with respect to a reference frame naturally computes long-range correspondences, effectively combining 3D reconstruction with 3D tracking. Unlike prior methods that rely heavily on 4D ground truth supervision, we employ a novel adaptation scheme based on a reprojection loss. We establish a new extensive benchmark for world-frame reconstruction and tracking, demonstrating the effectiveness and efficiency of our unified, data-driven framework. Our code, model, and benchmark will be released.
Related papers
- D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes [40.371542172080105]
We propose D2USt3R that regresses 4D pointmaps simuliously capture both static and dynamic 3D scene geometry in a feed-forward manner.<n>By explicitly incorporating both spatial and temporal aspects, our approach successfully encapsulates object-temporal dense correspondence to the proposed 4D pointmaps, enhancing downstream tasks.
arXiv Detail & Related papers (2025-04-08T17:59:50Z) - POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction [53.19968902152528]
We present POMATO, a unified framework for dynamic 3D reconstruction by marrying pointmap matching with temporal motion.<n>Specifically, our method learns an explicit matching relationship by mapping RGB pixels from both dynamic and static regions across different views to 3D pointmaps.<n>We show the effectiveness of the proposed pointmap matching and temporal fusion paradigm by demonstrating the remarkable performance across multiple downstream tasks.
arXiv Detail & Related papers (2025-04-08T05:33:13Z) - Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction [56.32589034046427]
We introduce Dynamic Point Maps (DPM), extending standard point maps to support 4D tasks such as motion segmentation, scene flow estimation, 3D object tracking, and 2D correspondence.<n>We train a DPM predictor on a mixture of synthetic and real data and evaluate it across diverse benchmarks for video depth prediction, dynamic point cloud reconstruction, 3D scene flow and object pose tracking, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-03-20T16:41:50Z) - SIRE: SE(3) Intrinsic Rigidity Embeddings [16.630400019100943]
We introduce SIRE, a self-supervised method for motion discovery of objects and dynamic scene reconstruction from casual scenes.<n>Our method trains an image encoder to estimate scene rigidity and geometry, supervised by a simple 4D reconstruction loss.<n>Our findings suggest that SIRE can learn strong geometry and motion rigidity priors from video data, with minimal supervision.
arXiv Detail & Related papers (2025-03-10T18:00:30Z) - Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving [116.10577967146762]
We propose Driv3R, a framework that directly regresses per-frame point maps from multi-view image sequences.<n>We employ a 4D flow predictor to identify moving objects within the scene to direct our network focus more on reconstructing these dynamic regions.<n>Driv3R outperforms previous frameworks in 4D dynamic scene reconstruction, achieving 15x faster inference speed.
arXiv Detail & Related papers (2024-12-09T18:58:03Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - Fast Encoder-Based 3D from Casual Videos via Point Track Processing [22.563073026889324]
We present TracksTo4D, a learning-based approach that enables inferring 3D structure and camera positions from dynamic content originating from casual videos.
TracksTo4D is trained in an unsupervised way on a dataset of casual videos.
Experiments show that TracksTo4D can reconstruct a temporal point cloud and camera positions of the underlying video with accuracy comparable to state-of-the-art methods.
arXiv Detail & Related papers (2024-04-10T15:37:00Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.