Unified Transformer Tracker for Object Tracking
- URL: http://arxiv.org/abs/2203.15175v1
- Date: Tue, 29 Mar 2022 01:38:49 GMT
- Title: Unified Transformer Tracker for Object Tracking
- Authors: Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang,
Zhicheng Yan
- Abstract summary: We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
- Score: 58.65901124158068
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As an important area in computer vision, object tracking has formed two
separate communities that respectively study Single Object Tracking (SOT) and
Multiple Object Tracking (MOT). However, current methods in one tracking
scenario are not easily adapted to the other due to the divergent training
datasets and tracking objects of both tasks. Although UniTrack
\cite{wang2021different} demonstrates that a shared appearance model with
multiple heads can be used to tackle individual tracking tasks, it fails to
exploit the large-scale tracking datasets for training and performs poorly on
single object tracking. In this work, we present the Unified Transformer
Tracker (UTT) to address tracking problems in different scenarios with one
paradigm. A track transformer is developed in our UTT to track the target in
both SOT and MOT. The correlation between the target and tracking frame
features is exploited to localize the target. We demonstrate that both SOT and
MOT tasks can be solved within this framework. The model can be simultaneously
end-to-end trained by alternatively optimizing the SOT and MOT objectives on
the datasets of individual tasks. Extensive experiments are conducted on
several benchmarks with a unified model trained on SOT and MOT datasets. Code
will be available at https://github.com/Flowerfan/Trackron.
Related papers
- HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision [34.7347336548199]
In camera-based 3D multi-object tracking (MOT), the prevailing methods follow the tracking-by-query-propagation paradigm.
We present HSTrack, a novel plug-and-play method designed to co-facilitate multi-task learning for detection and tracking.
arXiv Detail & Related papers (2024-11-11T08:18:49Z) - OmniTracker: Unifying Object Tracking by Tracking-with-Detection [119.51012668709502]
OmniTracker is presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline.
Experiments on 7 tracking datasets, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - DIVOTrack: A Novel Dataset and Baseline Method for Cross-View
Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available.
Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - InterTrack: Interaction Transformer for 3D Multi-Object Tracking [9.283656931246645]
3D multi-object tracking (MOT) is a key problem for autonomous vehicles.
Our proposed solution, InterTrack, generates discriminative object representations for data association.
We validate our approach on the nuScenes 3D MOT benchmark, where we observe significant improvements.
arXiv Detail & Related papers (2022-08-17T03:24:36Z) - Track to Detect and Segment: An Online Multi-Object Tracker [81.15608245513208]
TraDeS is an online joint detection and tracking model, exploiting tracking clues to assist detection end-to-end.
TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features.
arXiv Detail & Related papers (2021-03-16T02:34:06Z) - Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous
Driving [22.693895321632507]
We propose a probabilistic, multi-modal, multi-object tracking system consisting of different trainable modules.
We show that our method outperforms current state-of-the-art on the NuScenes Tracking dataset.
arXiv Detail & Related papers (2020-12-26T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.