Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels
- URL: http://arxiv.org/abs/2603.02573v2
- Date: Thu, 05 Mar 2026 05:37:44 GMT
- Title: Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels
- Authors: Jiahao Lu, Jiayi Xu, Wenbo Hu, Ruijie Zhu, Chengfeng Zhao, Sai-Kit Yeung, Ying Shan, Yuan Liu,
- Abstract summary: Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos.<n>Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking.<n>We propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system.
- Score: 67.36972154532761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos. Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking. In this paper, we propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system. Built on the global 3D scene representation encoded by a VGGT-style ViT, Track4World applies a novel 3D correlation scheme to simultaneously estimate the pixel-wise 2D and 3D dense flow between arbitrary frame pairs. The estimated scene flow, along with the reconstructed 3D geometry, enables subsequent efficient 3D tracking of every pixel of this video. Extensive experiments on multiple benchmarks demonstrate that our approach consistently outperforms existing methods in 2D/3D flow estimation and 3D tracking, highlighting its robustness and scalability for real-world 4D reconstruction tasks.
Related papers
- TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels [49.456225518469516]
Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video.<n>We propose TrackingWorld, a novel pipeline for dense 3D tracking of almost all pixels within a world-centric 3D coordinate system.
arXiv Detail & Related papers (2025-12-09T08:35:42Z) - SpatialTrackerV2: 3D Point Tracking Made Easy [73.0350898700048]
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
arXiv Detail & Related papers (2025-07-16T17:59:03Z) - TAPIP3D: Tracking Any Point in Persistent 3D Geometry [25.357437591411347]
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular and RGB-D videos.<n>TAPIP3D represents videos as camera-stabilized feature clouds, leveraging depth and camera motion information.<n>Our 3D-centric formulation significantly improves performance over existing 3D point tracking methods.
arXiv Detail & Related papers (2025-04-20T19:09:43Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - 3D Visual Tracking Framework with Deep Learning for Asteroid Exploration [22.808962211830675]
In this paper, we focus on the studied accurate and real-time method for 3D tracking.
A new large-scale 3D asteroid tracking dataset is presented, including binocular video sequences, depth maps, and point clouds of diverse asteroids.
We propose a deep-learning based 3D tracking framework, named as Track3D, which involves 2D monocular tracker and a novel light-weight amodal axis-aligned bounding-box network, A3BoxNet.
arXiv Detail & Related papers (2021-11-21T04:14:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.