Rethinking the competition between detection and ReID in Multi-Object
Tracking
- URL: http://arxiv.org/abs/2010.12138v3
- Date: Tue, 24 May 2022 11:48:32 GMT
- Title: Rethinking the competition between detection and ReID in Multi-Object
Tracking
- Authors: Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Shuyuan Zhu, Weiming Hu
- Abstract summary: One-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT)
In this paper, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design to better learn task-dependent representations.
We also introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings.
- Score: 44.59367033562385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to balanced accuracy and speed, one-shot models which jointly learn
detection and identification embeddings, have drawn great attention in
multi-object tracking (MOT). However, the inherent differences and relations
between detection and re-identification (ReID) are unconsciously overlooked
because of treating them as two isolated tasks in the one-shot tracking
paradigm. This leads to inferior performance compared with existing two-stage
methods. In this paper, we first dissect the reasoning process for these two
tasks, which reveals that the competition between them inevitably would destroy
task-dependent representations learning. To tackle this problem, we propose a
novel reciprocal network (REN) with a self-relation and cross-relation design
so that to impel each branch to better learn task-dependent representations.
The proposed model aims to alleviate the deleterious tasks competition,
meanwhile improve the cooperation between detection and ReID. Furthermore, we
introduce a scale-aware attention network (SAAN) that prevents semantic level
misalignment to improve the association capability of ID embeddings. By
integrating the two delicately designed networks into a one-shot online MOT
system, we construct a strong MOT tracker, namely CSTrack. Our tracker achieves
the state-of-the-art performance on MOT16, MOT17 and MOT20 datasets, without
other bells and whistles. Moreover, CSTrack is efficient and runs at 16.4 FPS
on a single modern GPU, and its lightweight version even runs at 34.6 FPS. The
complete code has been released at https://github.com/JudasDie/SOTS.
Related papers
- ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object
Tracking [20.286114226299237]
This paper introduces SMILEtrack, an innovative object tracker with a Siamese network-based Similarity Learning Module (SLM)
The SLM calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding models.
Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames.
arXiv Detail & Related papers (2022-11-16T10:49:48Z) - RelationTrack: Relation-aware Multiple Object Tracking with Decoupled
Representation [3.356734463419838]
Existing online multiple object tracking (MOT) algorithms often consist of two subtasks, detection and re-identification (ReID)
In order to enhance the inference speed and reduce the complexity, current methods commonly integrate these double subtasks into a unified framework.
We devise a module named Global Context Disentangling (GCD) that decouples the learned representation into detection-specific and ReID-specific embeddings.
To resolve this restriction, we develop a module, referred to as Guided Transformer (GTE), by combining the powerful reasoning ability of Transformer encoder and deformable attention.
arXiv Detail & Related papers (2021-05-10T13:00:40Z) - Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT
Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy.
Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z) - Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association.
The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.