Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using
Spatial and Temporal Transformers
- URL: http://arxiv.org/abs/2103.14829v1
- Date: Sat, 27 Mar 2021 07:23:38 GMT
- Title: Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using
Spatial and Temporal Transformers
- Authors: Tianyu Zhu, Markus Hiller, Mahsa Ehsanpour, Rongkai Ma, Tom Drummond,
Hamid Rezatofighi
- Abstract summary: MO3TR is an end-to-end online multi-object tracking framework.
It encodes object interactions into long-term temporal embeddings.
It tracks initiation and termination without the need for an explicit data association module.
- Score: 20.806348407522083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tracking a time-varying indefinite number of objects in a video sequence over
time remains a challenge despite recent advances in the field. Ignoring
long-term temporal information, most existing approaches are not able to
properly handle multi-object tracking challenges such as occlusion. To address
these shortcomings, we present MO3TR: a truly end-to-end Transformer-based
online multi-object tracking (MOT) framework that learns to handle occlusions,
track initiation and termination without the need for an explicit data
association module or any heuristics/post-processing. MO3TR encodes object
interactions into long-term temporal embeddings using a combination of spatial
and temporal Transformers, and recursively uses the information jointly with
the input data to estimate the states of all tracked objects over time. The
spatial attention mechanism enables our framework to learn implicit
representations between all the objects and the objects to the measurements,
while the temporal attention mechanism focuses on specific parts of past
information, allowing our approach to resolve occlusions over multiple frames.
Our experiments demonstrate the potential of this new approach, reaching new
state-of-the-art results on multiple MOT metrics for two popular multi-object
tracking benchmarks. Our code will be made publicly available.
Related papers
- Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking [15.533652456081374]
Multi-object tracking (MOT) endeavors to precisely estimate identities and positions of multiple objects over time.
Modern detectors may occasionally miss some objects in certain frames, causing trackers to cease tracking prematurely.
We propose BUSCA, meaning to search', a versatile framework compatible with any online TbD system.
arXiv Detail & Related papers (2024-07-14T10:45:12Z) - Transformer Network for Multi-Person Tracking and Re-Identification in
Unconstrained Environment [0.6798775532273751]
Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics.
We put forward an integrated MOT method that marries object detection and identity linkage within a singular, end-to-end trainable framework.
Our system leverages a robust memory-temporal memory module that retains extensive historical observations and effectively encodes them using an attention-based aggregator.
arXiv Detail & Related papers (2023-12-19T08:15:22Z) - TrajectoryFormer: 3D Object Tracking Transformer with Predictive
Trajectory Hypotheses [51.60422927416087]
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots.
We present TrajectoryFormer, a novel point-cloud-based 3D MOT framework.
arXiv Detail & Related papers (2023-06-09T13:31:50Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - MOTR: End-to-End Multiple-Object Tracking with TRansformer [31.78906135775541]
We present MOTR, the first fully end-to-end multiple object tracking framework.
It learns to model the long-range temporal variation of the objects.
Results show that MOTR achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-05-07T13:27:01Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture.
New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time.
TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.