End-to-end Tracking with a Multi-query Transformer
- URL: http://arxiv.org/abs/2210.14601v1
- Date: Wed, 26 Oct 2022 10:19:37 GMT
- Title: End-to-end Tracking with a Multi-query Transformer
- Authors: Bruno Korbar and Andrew Zisserman
- Abstract summary: Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
- Score: 96.13468602635082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiple-object tracking (MOT) is a challenging task that requires
simultaneous reasoning about location, appearance, and identity of the objects
in the scene over time. Our aim in this paper is to move beyond
tracking-by-detection approaches, that perform well on datasets where the
object classes are known, to class-agnostic tracking that performs well also
for unknown object classes.To this end, we make the following three
contributions: first, we introduce {\em semantic detector queries} that enable
an object to be localized by specifying its approximate position, or its
appearance, or both; second, we use these queries within an auto-regressive
framework for tracking, and propose a multi-query tracking transformer
(\textit{MQT}) model for simultaneous tracking and appearance-based
re-identification (reID) based on the transformer architecture with deformable
attention. This formulation allows the tracker to operate in a class-agnostic
manner, and the model can be trained end-to-end; finally, we demonstrate that
\textit{MQT} performs competitively on standard MOT benchmarks, outperforms all
baselines on generalised-MOT, and generalises well to a much harder tracking
problems such as tracking any object on the TAO dataset.
Related papers
- ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association [15.161640917854363]
We introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras.
We introduce a learnable data association module based on edge-augmented cross-attention.
We integrate this association module into the decoder layer of a DETR-based 3D detector.
arXiv Detail & Related papers (2024-05-14T19:02:33Z) - OmniTracker: Unifying Object Tracking by Tracking-with-Detection [119.51012668709502]
OmniTracker is presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline.
Experiments on 7 tracking datasets, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture.
New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time.
TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z) - End-to-End Multi-Object Tracking with Global Response Map [23.755882375664875]
We present a completely end-to-end approach that takes image-sequence/video as input and outputs directly the located and tracked objects of learned types.
Specifically, with our introduced multi-object representation strategy, a global response map can be accurately generated over frames.
Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieved state-of-the-art performance on several tracking metrics.
arXiv Detail & Related papers (2020-07-13T12:30:49Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.