CoTracker: It is Better to Track Together
- URL: http://arxiv.org/abs/2307.07635v3
- Date: Tue, 01 Oct 2024 13:15:53 GMT
- Title: CoTracker: It is Better to Track Together
- Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht,
- Abstract summary: CoTracker is a transformer-based model that tracks a large number of 2D points in long video sequences.
We show that joint tracking significantly improves tracking accuracy and robustness, and allows CoTracker to track occluded points and points outside of the camera view.
- Score: 70.63040730154984
- License:
- Abstract: We introduce CoTracker, a transformer-based model that tracks a large number of 2D points in long video sequences. Differently from most existing approaches that track points independently, CoTracker tracks them jointly, accounting for their dependencies. We show that joint tracking significantly improves tracking accuracy and robustness, and allows CoTracker to track occluded points and points outside of the camera view. We also introduce several innovations for this class of trackers, including using token proxies that significantly improve memory efficiency and allow CoTracker to track 70k points jointly and simultaneously at inference on a single GPU. CoTracker is an online algorithm that operates causally on short windows. However, it is trained utilizing unrolled windows as a recurrent network, maintaining tracks for long periods of time even when points are occluded or leave the field of view. Quantitatively, CoTracker substantially outperforms prior trackers on standard point-tracking benchmarks.
Related papers
- Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning [33.521077115333696]
We present a general framework to unify various tracking tasks, termed as OneTracker.
OneTracker first performs a large-scale pre-training on a RGB tracker called Foundation Tracker.
Then we regard other modality information as prompt and build Prompt Tracker upon Foundation Tracker.
arXiv Detail & Related papers (2024-03-14T17:59:13Z) - SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud
Tracking [26.405519771454102]
We introduce Sequence-to-Sequence tracking paradigm and a tracker named SeqTrack3D to capture target motion across continuous frames.
This novel method ensures robust tracking by leveraging location priors from historical boxes, even in scenes with sparse points.
Experiments conducted on large-scale datasets show that SeqTrack3D achieves new state-of-the-art performances.
arXiv Detail & Related papers (2024-02-26T02:14:54Z) - Tracking with Human-Intent Reasoning [64.69229729784008]
This work proposes a new tracking task -- Instruction Tracking.
It involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames.
TrackGPT is capable of performing complex reasoning-based tracking.
arXiv Detail & Related papers (2023-12-29T03:22:18Z) - DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World
Videos [9.304179915575114]
DriveTrack is a new benchmark and data generation framework for keypoint tracking in real-world videos.
We release a dataset consisting of 1 billion point tracks across 24 hours of video, which is seven orders of magnitude greater than prior real-world benchmarks.
We show that fine-tuning keypoint trackers on DriveTrack improves accuracy on real-world scenes by up to 7%.
arXiv Detail & Related papers (2023-12-15T04:06:52Z) - TopTrack: Tracking Objects By Their Top [13.020122353444497]
TopTrack is a joint detection-and-tracking method that uses the top of the object as a keypoint for detection instead of the center.
We performed experiments to show that using the object top as a keypoint for detection can reduce the amount of missed detections.
arXiv Detail & Related papers (2023-04-12T19:00:12Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Learning to Track Objects from Unlabeled Videos [63.149201681380305]
In this paper, we propose to learn an Unsupervised Single Object Tracker (USOT) from scratch.
To narrow the gap between unsupervised trackers and supervised counterparts, we propose an effective unsupervised learning approach composed of three stages.
Experiments show that the proposed USOT learned from unlabeled videos performs well over the state-of-the-art unsupervised trackers by large margins.
arXiv Detail & Related papers (2021-08-28T22:10:06Z) - LightTrack: Finding Lightweight Neural Networks for Object Tracking via
One-Shot Architecture Search [104.84999119090887]
We present LightTrack, which uses neural architecture search (NAS) to design more lightweight and efficient object trackers.
Comprehensive experiments show that our LightTrack is effective.
It can find trackers that achieve superior performance compared to handcrafted SOTA trackers, such as SiamRPN++ and Ocean.
arXiv Detail & Related papers (2021-04-29T17:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.