Simple Cues Lead to a Strong Multi-Object Tracker
- URL: http://arxiv.org/abs/2206.04656v7
- Date: Wed, 26 Apr 2023 09:44:03 GMT
- Title: Simple Cues Lead to a Strong Multi-Object Tracker
- Authors: Jenny Seidenschwarz, Guillem Bras\'o, Victor Castro Serrano, Ismail
Elezi, and Laura Leal-Taix\'e
- Abstract summary: We propose a new type of tracking-by-detection (TbD) for Multi-Object Tracking.
We show that a combination of our appearance features with a simple motion model leads to strong tracking results.
Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance.
- Score: 3.7189423451031356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For a long time, the most common paradigm in Multi-Object Tracking was
tracking-by-detection (TbD), where objects are first detected and then
associated over video frames. For association, most models resourced to motion
and appearance cues, e.g., re-identification networks. Recent approaches based
on attention propose to learn the cues in a data-driven manner, showing
impressive results. In this paper, we ask ourselves whether simple good old TbD
methods are also capable of achieving the performance of end-to-end models. To
this end, we propose two key ingredients that allow a standard
re-identification network to excel at appearance-based tracking. We extensively
analyse its failure cases, and show that a combination of our appearance
features with a simple motion model leads to strong tracking results. Our
tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and
DanceTrack, achieving state-of-the-art performance.
https://github.com/dvl-tum/GHOST.
Related papers
- Multi-object Tracking by Detection and Query: an efficient end-to-end manner [23.926668750263488]
Multi-object tracking is advancing through two dominant paradigms: traditional tracking by detection and newly emerging tracking by query.
We propose the tracking-by-detection-and-query paradigm, which is achieved by a Learnable Associator.
Compared to tracking-by-query models, LAID achieves competitive tracking accuracy with notably higher training efficiency.
arXiv Detail & Related papers (2024-11-09T14:38:08Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse
Motion [56.1428110894411]
We propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation.
As the dataset contains mostly group dancing videos, we name it "DanceTrack"
We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks.
arXiv Detail & Related papers (2021-11-29T16:49:06Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - Discriminative Appearance Modeling with Multi-track Pooling for
Real-time Multi-object Tracking [20.66906781151]
In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene.
Many approaches model each target in isolation and lack the ability to use all the targets in the scene to jointly update the memory.
We propose a training strategy adapted to multi-track pooling which generates hard tracking episodes online.
arXiv Detail & Related papers (2021-01-28T18:12:39Z) - Chained-Tracker: Chaining Paired Attentive Regression Results for
End-to-End Joint Multiple-Object Detection and Tracking [102.31092931373232]
We propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution.
The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective.
arXiv Detail & Related papers (2020-07-29T02:38:49Z) - End-to-End Multi-Object Tracking with Global Response Map [23.755882375664875]
We present a completely end-to-end approach that takes image-sequence/video as input and outputs directly the located and tracked objects of learned types.
Specifically, with our introduced multi-object representation strategy, a global response map can be accurately generated over frames.
Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieved state-of-the-art performance on several tracking metrics.
arXiv Detail & Related papers (2020-07-13T12:30:49Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z) - Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking.
Our TS-RCN can be integrated with existing deep learning based visual trackers.
To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z) - ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking [80.02322563402758]
One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets.
We introduce a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion.
arXiv Detail & Related papers (2020-04-16T06:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.