Global Tracking Transformers
- URL: http://arxiv.org/abs/2203.13250v1
- Date: Thu, 24 Mar 2022 17:58:04 GMT
- Title: Global Tracking Transformers
- Authors: Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Phillip Kr\"ahenb\"uhl
- Abstract summary: We present a novel transformer-based architecture for global multi-object tracking.
The core component is a global tracking transformer that operates on objects from all frames in the sequence.
Our framework seamlessly integrates into state-of-the-art large-vocabulary detectors to track any objects.
- Score: 76.58184022651596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel transformer-based architecture for global multi-object
tracking. Our network takes a short sequence of frames as input and produces
global trajectories for all objects. The core component is a global tracking
transformer that operates on objects from all frames in the sequence. The
transformer encodes object features from all frames, and uses trajectory
queries to group them into trajectories. The trajectory queries are object
features from a single frame and naturally produce unique trajectories. Our
global tracking transformer does not require intermediate pairwise grouping or
combinatorial association, and can be jointly trained with an object detector.
It achieves competitive performance on the popular MOT17 benchmark, with 75.3
MOTA and 59.1 HOTA. More importantly, our framework seamlessly integrates into
state-of-the-art large-vocabulary detectors to track any objects. Experiments
on the challenging TAO dataset show that our framework consistently improves
upon baselines that are based on pairwise association, outperforming published
works by a significant 7.7 tracking mAP. Code is available at
https://github.com/xingyizhou/GTR.
Related papers
- PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking [36.5272157173876]
We show that a pure Transformer can unify short- and long-term associations in a decoupled and online manner.
Experiments show that a classic Transformer architecture naturally suits the association problem and achieves a strong baseline.
This work pioneers a promising Transformer-based approach for the MOT task, and provides code to facilitate further research.
arXiv Detail & Related papers (2024-05-23T02:44:46Z) - Tracking Transforming Objects: A Benchmark [2.53045657890708]
This study collects a novel dedicated dataset for Tracking Transforming Objects, called DTTO, which contains 100 sequences, amounting to approximately 9.3K frames.
We provide carefully hand-annotated bounding boxes for each frame within these sequences, making DTTO the pioneering benchmark dedicated to tracking transforming objects.
We thoroughly evaluate 20 state-of-the-art trackers on the benchmark, aiming to comprehend the performance of existing methods and provide a comparison for future research on DTTO.
arXiv Detail & Related papers (2024-04-28T11:24:32Z) - Efficient Joint Detection and Multiple Object Tracking with Spatially
Aware Transformer [0.8808021343665321]
We propose a light-weight and highly efficient Joint Detection and Tracking pipeline for the task of Multi-Object Tracking.
It is driven by a transformer based backbone instead of CNN, which is highly scalable with the input resolution.
As a result of our modifications, we reduce the overall model size of TransTrack by 58.73% and the complexity by 78.72%.
arXiv Detail & Related papers (2022-11-09T07:19:33Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU.
This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z) - TransMOT: Spatial-Temporal Graph Transformer for Multiple Object
Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video.
TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy.
The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z) - TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture.
New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time.
TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z) - End-to-End Multi-Object Tracking with Global Response Map [23.755882375664875]
We present a completely end-to-end approach that takes image-sequence/video as input and outputs directly the located and tracked objects of learned types.
Specifically, with our introduced multi-object representation strategy, a global response map can be accurately generated over frames.
Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieved state-of-the-art performance on several tracking metrics.
arXiv Detail & Related papers (2020-07-13T12:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.