SpatialTrackerV2: 3D Point Tracking Made Easy
- URL: http://arxiv.org/abs/2507.12462v2
- Date: Sat, 19 Jul 2025 02:07:12 GMT
- Title: SpatialTrackerV2: 3D Point Tracking Made Easy
- Authors: Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou,
- Abstract summary: SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
- Score: 73.0350898700048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker. It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%, and matches the accuracy of leading dynamic 3D reconstruction approaches while running 50$\times$ faster.
Related papers
- TAPIP3D: Tracking Any Point in Persistent 3D Geometry [25.357437591411347]
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular and RGB-D videos.<n>TAPIP3D represents videos as camera-stabilized feature clouds, leveraging depth and camera motion information.<n>Our 3D-centric formulation significantly improves performance over existing 3D point tracking methods.
arXiv Detail & Related papers (2025-04-20T19:09:43Z) - DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction [65.46359561104867]
We target the challenge of online 2D and 3D point tracking from unposed monocular camera input.<n>We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion.<n>We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
arXiv Detail & Related papers (2024-09-03T17:58:03Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for
Autonomous Driving [3.8073142980733]
We propose jointly training 3D detection and 3D tracking from only monocular videos in an end-to-end manner.
Time3D achieves 21.4% AMOTA, 13.6% AMOTP on the nuScenes 3D tracking benchmark, surpassing all published competitors.
arXiv Detail & Related papers (2022-05-30T06:41:10Z) - 3D Visual Tracking Framework with Deep Learning for Asteroid Exploration [22.808962211830675]
In this paper, we focus on the studied accurate and real-time method for 3D tracking.
A new large-scale 3D asteroid tracking dataset is presented, including binocular video sequences, depth maps, and point clouds of diverse asteroids.
We propose a deep-learning based 3D tracking framework, named as Track3D, which involves 2D monocular tracker and a novel light-weight amodal axis-aligned bounding-box network, A3BoxNet.
arXiv Detail & Related papers (2021-11-21T04:14:45Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.