CoWTracker: Tracking by Warping instead of Correlation
- URL: http://arxiv.org/abs/2602.04877v1
- Date: Wed, 04 Feb 2026 18:58:59 GMT
- Title: CoWTracker: Tracking by Warping instead of Correlation
- Authors: Zihang Lai, Eldar Insafutdinov, Edgar Sucar, Andrea Vedaldi,
- Abstract summary: We propose a dense point tracker that eschews cost volumes in favor of warping.<n>Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate.<n>Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP.
- Score: 53.834673070954494
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dense point tracking is a fundamental problem in computer vision, with applications ranging from video analysis to robotic manipulation. State-of-the-art trackers typically rely on cost volumes to match features across frames, but this approach incurs quadratic complexity in spatial resolution, limiting scalability and efficiency. In this paper, we propose \method, a novel dense point tracker that eschews cost volumes in favor of warping. Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate. Combined with a transformer architecture that performs joint spatiotemporal reasoning across all tracks, our design establishes long-range correspondences without computing feature correlations. Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP. Remarkably, the model also excels at optical flow, sometimes outperforming specialized methods on the Sintel, KITTI, and Spring benchmarks. These results suggest that warping-based architectures can unify dense point tracking and optical flow estimation.
Related papers
- AllTracker: Efficient Dense Point Tracking at High Resolution [62.840979507761425]
We introduce AllTracker, a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video.<n>Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps.<n>The model is fast and parameter-efficient (16 million parameters), and delivers state-of-the-art point tracking accuracy at high resolution (i.e., tracking 768x1024 pixels, on a 40G GPU)
arXiv Detail & Related papers (2025-06-08T22:55:06Z) - Online Dense Point Tracking with Streaming Memory [54.22820729477756]
Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video.<n>Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one.<n>We present a lightweight and fast model with textbfStreaming memory for dense textbfPOint textbfTracking and online video processing.
arXiv Detail & Related papers (2025-03-09T06:16:49Z) - ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking [41.889032460337226]
ProTracker is a novel framework for accurate and robust long-term dense tracking of arbitrary points in videos.<n>This design effectively combines global semantic information with temporally aware low-level features.<n>Experiments demonstrate that ProTracker attains state-of-the-art performance among optimization-based approaches.
arXiv Detail & Related papers (2025-01-06T18:55:52Z) - LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [53.5449912019877]
We present the Long-term Effective Any Point Tracking (LEAP) module.<n>LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.<n>Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - Dense Optical Tracking: Connecting the Dots [82.79642869586587]
DOT is a novel, simple and efficient method for solving the problem of point tracking in a video.
We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal trackers" like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker.
arXiv Detail & Related papers (2023-12-01T18:59:59Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - MFT: Long-Term Tracking of Every Pixel [0.36832029288386137]
Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking.
Method exploits optical flows estimated between consecutive frames.
Tracks densely orders of magnitude faster than state-of-the-art point-tracking methods.
arXiv Detail & Related papers (2023-05-22T13:02:46Z) - Particle Videos Revisited: Tracking Through Occlusions Using Point
Trajectories [29.258861811749103]
We revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem.
We re-build this classic approach using components that drive the current state-of-the-art in flow and object tracking.
We train our models using long-range amodal point trajectories mined from existing optical flow datasets.
arXiv Detail & Related papers (2022-04-08T16:05:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.