Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object
Tracking
- URL: http://arxiv.org/abs/2305.12724v1
- Date: Mon, 22 May 2023 05:18:34 GMT
- Title: Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object
Tracking
- Authors: Feng Yan, Weixin Luo, Yujie Zhong, Yiyang Gan, Lin Ma
- Abstract summary: Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.
We present Co-MOT, a simple and effective method to facilitate e2e-MOT by a novel coopetition label assignment with a shadow concept.
- Score: 27.74953961900086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not
surpassed non-end-to-end tracking-by-detection methods. One potential reason is
its label assignment strategy during training that consistently binds the
tracked objects with tracking queries and then assigns the few newborns to
detection queries. With one-to-one bipartite matching, such an assignment will
yield unbalanced training, i.e., scarce positive samples for detection queries,
especially for an enclosed scene, as the majority of the newborns come on stage
at the beginning of videos. Thus, e2e-MOT will be easier to yield a tracking
terminal without renewal or re-initialization, compared to other
tracking-by-detection methods. To alleviate this problem, we present Co-MOT, a
simple and effective method to facilitate e2e-MOT by a novel coopetition label
assignment with a shadow concept. Specifically, we add tracked objects to the
matching targets for detection queries when performing the label assignment for
training the intermediate decoders. For query initialization, we expand each
query by a set of shadow counterparts with limited disturbance to itself. With
extensive ablations, Co-MOT achieves superior performance without extra costs,
e.g., 69.4% HOTA on DanceTrack and 52.8% TETA on BDD100K. Impressively, Co-MOT
only requires 38\% FLOPs of MOTRv2 to attain a similar performance, resulting
in the 1.4$\times$ faster inference speed.
Related papers
- Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking [15.533652456081374]
Multi-object tracking (MOT) endeavors to precisely estimate identities and positions of multiple objects over time.
Modern detectors may occasionally miss some objects in certain frames, causing trackers to cease tracking prematurely.
We propose BUSCA, meaning to search', a versatile framework compatible with any online TbD system.
arXiv Detail & Related papers (2024-07-14T10:45:12Z) - Multiple Object Tracking as ID Prediction [14.890192237433771]
In Multiple Object Tracking (MOT), tracking-by-detection methods have stood the test for a long time.
They leverage single-frame detectors and treat object association as a post-processing step through hand-crafted algorithms and surrogate tasks.
However, the nature of techniques prevents end-to-end exploitation of training data, leading to increasingly cumbersome and challenging manual modification.
arXiv Detail & Related papers (2024-03-25T15:09:54Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - SparseTrack: Multi-Object Tracking by Performing Scene Decomposition
based on Pseudo-Depth [84.64121608109087]
We propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images.
Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets.
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
arXiv Detail & Related papers (2023-06-08T14:36:10Z) - Real-time Multi-Object Tracking Based on Bi-directional Matching [0.0]
This study offers a bi-directional matching algorithm for multi-object tracking.
A stranded area is used in the matching algorithm to temporarily store the objects that fail to be tracked.
In the MOT17 challenge, the proposed algorithm achieves 63.4% MOTA, 55.3% IDF1, and 20.1 FPS tracking speed.
arXiv Detail & Related papers (2023-03-15T08:38:08Z) - MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained
Object Detectors [14.69168925956635]
MOTRv2 is a pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector.
It ranks the 1st place (73.4% HOTA on DanceTrack) in the 1st Multiple People Tracking in Group Dance Challenge.
It reaches state-of-the-art performance on the BDD100K dataset.
arXiv Detail & Related papers (2022-11-17T18:57:12Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - Chained-Tracker: Chaining Paired Attentive Regression Results for
End-to-End Joint Multiple-Object Detection and Tracking [102.31092931373232]
We propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution.
The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective.
arXiv Detail & Related papers (2020-07-29T02:38:49Z) - Tracking by Instance Detection: A Meta-Learning Approach [99.66119903655711]
We propose a principled three-step approach to build a high-performance tracker.
We build two trackers, named Retina-MAML and FCOS-MAML, based on two modern detectors RetinaNet and FCOS.
Both trackers run in real-time at 40 FPS.
arXiv Detail & Related papers (2020-04-02T05:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.