TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video
- URL: http://arxiv.org/abs/2511.21946v1
- Date: Wed, 26 Nov 2025 22:13:26 GMT
- Title: TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video
- Authors: Finlay G. C. Hudson, James A. D. Gardner, William A. P. Smith,
- Abstract summary: We introduce TAPVid-360, a novel task that requires predicting the 3D direction to queried scene points across a video sequence.<n>We exploit 360 videos as a source of supervision, resampling them into narrow field-of-view perspectives while computing ground truth directions.<n>Our baseline adapts CoTracker v3 to predict per-point rotations for direction updates, outperforming existing TAP and TAPVid 3D methods.
- Score: 7.009814571727852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans excel at constructing panoramic mental models of their surroundings, maintaining object permanence and inferring scene structure beyond visible regions. In contrast, current artificial vision systems struggle with persistent, panoramic understanding, often processing scenes egocentrically on a frame-by-frame basis. This limitation is pronounced in the Track Any Point (TAP) task, where existing methods fail to track 2D points outside the field of view. To address this, we introduce TAPVid-360, a novel task that requires predicting the 3D direction to queried scene points across a video sequence, even when far outside the narrow field of view of the observed video. This task fosters learning allocentric scene representations without needing dynamic 4D ground truth scene models for training. Instead, we exploit 360 videos as a source of supervision, resampling them into narrow field-of-view perspectives while computing ground truth directions by tracking points across the full panorama using a 2D pipeline. We introduce a new dataset and benchmark, TAPVid360-10k comprising 10k perspective videos with ground truth directional point tracking. Our baseline adapts CoTracker v3 to predict per-point rotations for direction updates, outperforming existing TAP and TAPVid 3D methods.
Related papers
- 360Anything: Geometry-Free Lifting of Images and Videos to 360° [51.50120114305155]
Existing approaches rely on explicit geometric alignment between the perspective and the equirectangular projection space.<n>We propose 360Anything, a geometry-free framework built upon pre-trained diffusion transformers.<n>Our approach achieves state-of-the-art performance on both image and video perspective-to-360 generation.
arXiv Detail & Related papers (2026-01-22T18:45:59Z) - Multi-View 3D Point Tracking [67.21282192436031]
We introduce the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera views.<n>Our model directly predicts 3D correspondences using a practical number of cameras.<n>We train on 5K synthetic multi-view Kubric sequences and evaluate on two real-world benchmarks.
arXiv Detail & Related papers (2025-08-28T17:58:20Z) - SpatialTrackerV2: 3D Point Tracking Made Easy [73.0350898700048]
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
arXiv Detail & Related papers (2025-07-16T17:59:03Z) - WorldExplorer: Towards Generating Fully Navigable 3D Scenes [48.16064304951891]
WorldExplorer builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints.<n>We generate multiple videos along short, pre-defined trajectories, that explore the scene in depth.<n>Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results.
arXiv Detail & Related papers (2025-06-02T15:41:31Z) - Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos [64.10180665546237]
360deg videos offer a more complete perspective of our surroundings.<n>Existing video models excel at producing standard videos, but their ability to generate full panoramic videos remains elusive.<n>We develop a high-quality data filtering pipeline to curate pairwise training data and improve the quality of 360deg video generation.<n> Experimental results demonstrate that our model can generate realistic and coherent 360deg videos from in-the-wild perspective video.
arXiv Detail & Related papers (2025-04-10T17:51:38Z) - SIRE: SE(3) Intrinsic Rigidity Embeddings [16.630400019100943]
We introduce SIRE, a self-supervised method for motion discovery of objects and dynamic scene reconstruction from casual scenes.<n>Our method trains an image encoder to estimate scene rigidity and geometry, supervised by a simple 4D reconstruction loss.<n>Our findings suggest that SIRE can learn strong geometry and motion rigidity priors from video data, with minimal supervision.
arXiv Detail & Related papers (2025-03-10T18:00:30Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping [23.456046776979903]
We propose to leverage multiview data of textitstatic points in arbitrary scenes (static or dynamic) to learn a neural 3D mapping module.
The neural 3D mapper consumes RGB-D data as input, and produces a 3D voxel grid of deep features as output.
We show that our unsupervised 3D object trackers outperform prior unsupervised 2D and 2.5D trackers, and approach the accuracy of supervised trackers.
arXiv Detail & Related papers (2020-08-04T02:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.