Related papers: AllTracker: Efficient Dense Point Tracking at High Resolution

AllTracker: Efficient Dense Point Tracking at High Resolution

URL: http://arxiv.org/abs/2506.07310v2
Date: Fri, 01 Aug 2025 18:44:17 GMT
Title: AllTracker: Efficient Dense Point Tracking at High Resolution
Authors: Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Pavel Tokmakov, Suya You, Rares Ambrus, Katerina Fragkiadaki, Leonidas J. Guibas,
Abstract summary: We introduce AllTracker, a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video.<n>Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps.<n>The model is fast and parameter-efficient (16 million parameters), and delivers state-of-the-art point tracking accuracy at high resolution (i.e., tracking 768x1024 pixels, on a 40G GPU)
Score: 62.840979507761425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce AllTracker: a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video. Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps. Unlike existing optical flow methods, our approach corresponds one frame to hundreds of subsequent frames, rather than just the next frame. We develop a new architecture for this task, blending techniques from existing work in optical flow and point tracking: the model performs iterative inference on low-resolution grids of correspondence estimates, propagating information spatially via 2D convolution layers, and propagating information temporally via pixel-aligned attention layers. The model is fast and parameter-efficient (16 million parameters), and delivers state-of-the-art point tracking accuracy at high resolution (i.e., tracking 768x1024 pixels, on a 40G GPU). A benefit of our design is that we can train jointly on optical flow datasets and point tracking datasets, and we find that doing so is crucial for top performance. We provide an extensive ablation study on our architecture details and training recipe, making it clear which details matter most. Our code and model weights are available at https://alltracker.github.io

Related papers

Track-On: Transformer-based Online Point Tracking with Memory [34.744546679670734]
We introduce Track-On, a simple transformer-based model designed for online long-term point tracking.<n>Unlike prior methods that depend on full temporal modeling, our model processes video frames causally without access to future frames.<n>At inference time, it employs patch classification and refinement to identify correspondences and track points with high accuracy.
arXiv Detail & Related papers (2025-01-30T17:04:11Z)
TAPTR: Tracking Any Point with Transformers as Detection [33.50183504731619]
We propose a simple and strong framework for Tracking Any Point with TRansformers (TAPTR) Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP. Our framework demonstrates strong performance with state-of-the-art performance on various TAP datasets with faster inference speed.
arXiv Detail & Related papers (2024-03-19T17:57:09Z)
Multi-Scene Generalized Trajectory Global Graph Solver with Composite Nodes for Multiple Object Tracking [61.69892497726235]
Composite Node Message Passing Network (CoNo-Link) is a framework for modeling ultra-long frames information for association. In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction. Our model can learn better predictions on longer-time scales by adding composite nodes.
arXiv Detail & Related papers (2023-12-14T14:00:30Z)
Dense Optical Tracking: Connecting the Dots [82.79642869586587]
DOT is a novel, simple and efficient method for solving the problem of point tracking in a video. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal trackers" like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker.
arXiv Detail & Related papers (2023-12-01T18:59:59Z)
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z)
Particle Videos Revisited: Tracking Through Occlusions Using Point Trajectories [29.258861811749103]
We revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem. We re-build this classic approach using components that drive the current state-of-the-art in flow and object tracking. We train our models using long-range amodal point trajectories mined from existing optical flow datasets.
arXiv Detail & Related papers (2022-04-08T16:05:48Z)
Polygonal Point Set Tracking [50.445151155209246]
We propose a novel learning-based polygonal point set tracking method. Our goal is to track corresponding points on the target contour. We present visual-effects applications of our method on part distortion and text mapping.
arXiv Detail & Related papers (2021-05-30T17:12:36Z)
Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation. This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.