RelationTrack: Relation-aware Multiple Object Tracking with Decoupled
Representation
- URL: http://arxiv.org/abs/2105.04322v1
- Date: Mon, 10 May 2021 13:00:40 GMT
- Title: RelationTrack: Relation-aware Multiple Object Tracking with Decoupled
Representation
- Authors: En Yu, Zhuoling Li, Shoudong Han and Hongwei Wang
- Abstract summary: Existing online multiple object tracking (MOT) algorithms often consist of two subtasks, detection and re-identification (ReID)
In order to enhance the inference speed and reduce the complexity, current methods commonly integrate these double subtasks into a unified framework.
We devise a module named Global Context Disentangling (GCD) that decouples the learned representation into detection-specific and ReID-specific embeddings.
To resolve this restriction, we develop a module, referred to as Guided Transformer (GTE), by combining the powerful reasoning ability of Transformer encoder and deformable attention.
- Score: 3.356734463419838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing online multiple object tracking (MOT) algorithms often consist of
two subtasks, detection and re-identification (ReID). In order to enhance the
inference speed and reduce the complexity, current methods commonly integrate
these double subtasks into a unified framework. Nevertheless, detection and
ReID demand diverse features. This issue would result in an optimization
contradiction during the training procedure. With the target of alleviating
this contradiction, we devise a module named Global Context Disentangling (GCD)
that decouples the learned representation into detection-specific and
ReID-specific embeddings. As such, this module provides an implicit manner to
balance the different requirements of these two subtasks. Moreover, we observe
that preceding MOT methods typically leverage local information to associate
the detected targets and neglect to consider the global semantic relation. To
resolve this restriction, we develop a module, referred to as Guided
Transformer Encoder (GTE), by combining the powerful reasoning ability of
Transformer encoder and deformable attention. Unlike previous works, GTE avoids
analyzing all the pixels and only attends to capture the relation between query
nodes and a few self-adaptively selected key samples. Therefore, it is
computationally efficient. Extensive experiments have been conducted on the
MOT16, MOT17 and MOT20 benchmarks to demonstrate the superiority of the
proposed MOT framework, namely RelationTrack. The experimental results indicate
that RelationTrack has surpassed preceding methods significantly and
established a new state-of-the-art performance, e.g., IDF1 of 70.5% and MOTA of
67.2% on MOT20.
Related papers
- Contrastive Learning for Multi-Object Tracking with Transformers [79.61791059432558]
We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss.
Our training scheme learns object appearances while preserving detection capabilities and with little overhead.
Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
arXiv Detail & Related papers (2023-11-14T10:07:52Z) - Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object
Tracking [27.74953961900086]
Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.
We present Co-MOT, a simple and effective method to facilitate e2e-MOT by a novel coopetition label assignment with a shadow concept.
arXiv Detail & Related papers (2023-05-22T05:18:34Z) - Transformer-based assignment decision network for multiple object
tracking [0.0]
We introduce Transformer-based Assignment Decision Network (TADN) that tackles data association without the need of explicit optimization during inference.
Our proposed approach outperforms the state-of-the-art in most evaluation metrics despite its simple nature as a tracker.
arXiv Detail & Related papers (2022-08-06T19:47:32Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Multi-object Tracking with a Hierarchical Single-branch Network [31.680667324595557]
We propose an online multi-object tracking framework based on a hierarchical single-branch network.
Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance.
Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance.
arXiv Detail & Related papers (2021-01-06T12:14:58Z) - Rethinking the competition between detection and ReID in Multi-Object
Tracking [44.59367033562385]
One-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT)
In this paper, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design to better learn task-dependent representations.
We also introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings.
arXiv Detail & Related papers (2020-10-23T02:44:59Z) - Joint Object Detection and Multi-Object Tracking with Graph Neural
Networks [32.1359455541169]
We propose a new instance of joint MOT approach based on Graph Neural Networks (GNNs)
We show the effectiveness of our GNN-based joint MOT approach and show state-of-the-art performance for both detection and MOT tasks.
arXiv Detail & Related papers (2020-06-23T17:07:00Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.