TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement
- URL: http://arxiv.org/abs/2306.08637v2
- Date: Wed, 30 Aug 2023 14:28:37 GMT
- Title: TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement
- Authors: Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf
Aytar, Joao Carreira, Andrew Zisserman
- Abstract summary: We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
- Score: 64.11385310305612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel model for Tracking Any Point (TAP) that effectively tracks
any queried point on any physical surface throughout a video sequence. Our
approach employs two stages: (1) a matching stage, which independently locates
a suitable candidate point match for the query point on every other frame, and
(2) a refinement stage, which updates both the trajectory and query features
based on local correlations. The resulting model surpasses all baseline methods
by a significant margin on the TAP-Vid benchmark, as demonstrated by an
approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model
facilitates fast inference on long and high-resolution video sequences. On a
modern GPU, our implementation has the capacity to track points faster than
real-time, and can be flexibly extended to higher-resolution videos. Given the
high-quality trajectories extracted from a large dataset, we demonstrate a
proof-of-concept diffusion model which generates trajectories from static
images, enabling plausible animations. Visualizations, source code, and
pretrained models can be found on our project webpage.
Related papers
- Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - A Fast and Map-Free Model for Trajectory Prediction in Traffics [2.435517936694533]
This paper proposes an efficient trajectory prediction model that is not dependent on traffic maps.
By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer, our model is able to learn rich dynamic and interaction information of all agents.
Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset.
arXiv Detail & Related papers (2023-07-19T08:36:31Z) - TAP-Vid: A Benchmark for Tracking Any Point in a Video [84.94877216665793]
We formalize the problem of tracking arbitrary physical points on surfaces over longer video clips, naming it tracking any point (TAP)
We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks.
We propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
arXiv Detail & Related papers (2022-11-07T17:57:02Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.