MOTR: End-to-End Multiple-Object Tracking with TRansformer
- URL: http://arxiv.org/abs/2105.03247v1
- Date: Fri, 7 May 2021 13:27:01 GMT
- Title: MOTR: End-to-End Multiple-Object Tracking with TRansformer
- Authors: Fangao Zeng, Bin Dong, Tiancai Wang, Cheng Chen, Xiangyu Zhang, Yichen
Wei
- Abstract summary: We present MOTR, the first fully end-to-end multiple object tracking framework.
It learns to model the long-range temporal variation of the objects.
Results show that MOTR achieves state-of-the-art performance.
- Score: 31.78906135775541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The key challenge in multiple-object tracking (MOT) task is temporal modeling
of the object under track. Existing tracking-by-detection methods adopt simple
heuristics, such as spatial or appearance similarity. Such methods, in spite of
their commonality, are overly simple and insufficient to model complex
variations, such as tracking through occlusion. Inherently, existing methods
lack the ability to learn temporal variations from data. In this paper, we
present MOTR, the first fully end-to-end multiple-object tracking framework. It
learns to model the long-range temporal variation of the objects. It performs
temporal association implicitly and avoids previous explicit heuristics. Built
on Transformer and DETR, MOTR introduces the concept of "track query". Each
track query models the entire track of an object. It is transferred and updated
frame-by-frame to perform object detection and tracking, in a seamless manner.
Temporal aggregation network combined with multi-frame training is proposed to
model the long-range temporal relation. Experimental results show that MOTR
achieves state-of-the-art performance. Code is available at
https://github.com/megvii-model/MOTR.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - Contrastive Learning for Multi-Object Tracking with Transformers [79.61791059432558]
We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss.
Our training scheme learns object appearances while preserving detection capabilities and with little overhead.
Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
arXiv Detail & Related papers (2023-11-14T10:07:52Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Standing Between Past and Future: Spatio-Temporal Modeling for
Multi-Camera 3D Multi-Object Tracking [30.357116118917368]
We propose an end-to-end multi-camera 3D multi-object tracking framework.
We name it "Past-and-Future reasoning for Tracking" (PFTrack)
arXiv Detail & Related papers (2023-02-07T23:46:34Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using
Spatial and Temporal Transformers [20.806348407522083]
MO3TR is an end-to-end online multi-object tracking framework.
It encodes object interactions into long-term temporal embeddings.
It tracks initiation and termination without the need for an explicit data association module.
arXiv Detail & Related papers (2021-03-27T07:23:38Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - Discriminative Appearance Modeling with Multi-track Pooling for
Real-time Multi-object Tracking [20.66906781151]
In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene.
Many approaches model each target in isolation and lack the ability to use all the targets in the scene to jointly update the memory.
We propose a training strategy adapted to multi-track pooling which generates hard tracking episodes online.
arXiv Detail & Related papers (2021-01-28T18:12:39Z) - Simultaneous Detection and Tracking with Motion Modelling for Multiple
Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association.
DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster.
We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.