TAO: A Large-Scale Benchmark for Tracking Any Object
- URL: http://arxiv.org/abs/2005.10356v1
- Date: Wed, 20 May 2020 21:07:28 GMT
- Title: TAO: A Large-Scale Benchmark for Tracking Any Object
- Authors: Achal Dave, Tarasha Khurana, Pavel Tokmakov, Cordelia Schmid, Deva
Ramanan
- Abstract summary: Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
- Score: 95.87310116010185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For many years, multi-object tracking benchmarks have focused on a handful of
categories. Motivated primarily by surveillance and self-driving applications,
these datasets provide tracks for people, vehicles, and animals, ignoring the
vast majority of objects in the world. By contrast, in the related field of
object detection, the introduction of large-scale, diverse datasets (e.g.,
COCO) have fostered significant progress in developing highly robust solutions.
To bridge this gap, we introduce a similarly diverse dataset for Tracking Any
Object (TAO). It consists of 2,907 high resolution videos, captured in diverse
environments, which are half a minute long on average. Importantly, we adopt a
bottom-up approach for discovering a large vocabulary of 833 categories, an
order of magnitude more than prior tracking benchmarks. To this end, we ask
annotators to label objects that move at any point in the video, and give names
to them post factum. Our vocabulary is both significantly larger and
qualitatively different from existing tracking datasets. To ensure scalability
of annotation, we employ a federated approach that focuses manual effort on
labeling tracks for those relevant objects in a video (e.g., those that move).
We perform an extensive evaluation of state-of-the-art trackers and make a
number of important discoveries regarding large-vocabulary tracking in an
open-world. In particular, we show that existing single- and multi-object
trackers struggle when applied to this scenario in the wild, and that
detection-based, multi-object trackers are in fact competitive with
user-initialized ones. We hope that our dataset and analysis will boost further
progress in the tracking community.
Related papers
- Tracking Reflected Objects: A Benchmark [12.770787846444406]
We introduce TRO, a benchmark specifically for Tracking Reflected Objects.
TRO includes 200 sequences with around 70,000 frames, each carefully annotated with bounding boxes.
To provide a stronger baseline, we propose a new tracker, HiP-HaTrack, which uses hierarchical features to improve performance.
arXiv Detail & Related papers (2024-07-07T02:22:45Z) - Iterative Scale-Up ExpansionIoU and Deep Features Association for
Multi-Object Tracking in Sports [26.33239898091364]
We propose a novel online and robust multi-object tracking approach named deep ExpansionIoU (Deep-EIoU) for sports scenarios.
Unlike conventional methods, we abandon the use of the Kalman filter and leverage the iterative scale-up ExpansionIoU and deep features for robust tracking in sports scenarios.
Our proposed method demonstrates remarkable effectiveness in tracking irregular motion objects, achieving a score of 77.2% on the SportsMOT dataset and 85.4% on the SoccerNet-Tracking dataset.
arXiv Detail & Related papers (2023-06-22T17:47:08Z) - DIVOTrack: A Novel Dataset and Baseline Method for Cross-View
Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available.
Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z) - Beyond SOT: Tracking Multiple Generic Objects at Once [141.36900362724975]
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video.
We introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence.
Our approach achieves highly competitive results on single-object GOT datasets, setting a new state of the art on TrackingNet with a success rate AUC of 84.4%.
arXiv Detail & Related papers (2022-12-22T17:59:19Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to
Better Classify Objects in Videos [36.28269135795851]
We present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet.
By simply attaching our method to QDTrack on top of ResNet-101, we achieve the new state-of-the-art, 19.9% and 15.7% TrackAP_50 on TAO validation and test sets.
arXiv Detail & Related papers (2022-06-05T07:51:58Z) - DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse
Motion [56.1428110894411]
We propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation.
As the dataset contains mostly group dancing videos, we name it "DanceTrack"
We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks.
arXiv Detail & Related papers (2021-11-29T16:49:06Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - Discriminative Appearance Modeling with Multi-track Pooling for
Real-time Multi-object Tracking [20.66906781151]
In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene.
Many approaches model each target in isolation and lack the ability to use all the targets in the scene to jointly update the memory.
We propose a training strategy adapted to multi-track pooling which generates hard tracking episodes online.
arXiv Detail & Related papers (2021-01-28T18:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.