End-to-End Multi-Object Tracking with Global Response Map
- URL: http://arxiv.org/abs/2007.06344v1
- Date: Mon, 13 Jul 2020 12:30:49 GMT
- Title: End-to-End Multi-Object Tracking with Global Response Map
- Authors: Xingyu Wan, Jiakai Cao, Sanping Zhou, Jinjun Wang
- Abstract summary: We present a completely end-to-end approach that takes image-sequence/video as input and outputs directly the located and tracked objects of learned types.
Specifically, with our introduced multi-object representation strategy, a global response map can be accurately generated over frames.
Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieved state-of-the-art performance on several tracking metrics.
- Score: 23.755882375664875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing Multi-Object Tracking (MOT) approaches follow the
Tracking-by-Detection paradigm and the data association framework where objects
are firstly detected and then associated. Although deep-learning based method
can noticeably improve the object detection performance and also provide good
appearance features for cross-frame association, the framework is not
completely end-to-end, and therefore the computation is huge while the
performance is limited. To address the problem, we present a completely
end-to-end approach that takes image-sequence/video as input and outputs
directly the located and tracked objects of learned types. Specifically, with
our introduced multi-object representation strategy, a global response map can
be accurately generated over frames, from which the trajectory of each tracked
object can be easily picked up, just like how a detector inputs an image and
outputs the bounding boxes of each detected object. The proposed model is fast
and accurate. Experimental results based on the MOT16 and MOT17 benchmarks show
that our proposed on-line tracker achieved state-of-the-art performance on
several tracking metrics.
Related papers
- Matching Anything by Segmenting Anything [109.2507425045143]
We propose MASA, a novel method for robust instance association learning.
MASA learns instance-level correspondence through exhaustive data transformations.
We show that MASA achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences.
arXiv Detail & Related papers (2024-06-06T16:20:07Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with
Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking.
Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to
Better Classify Objects in Videos [36.28269135795851]
We present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet.
By simply attaching our method to QDTrack on top of ResNet-101, we achieve the new state-of-the-art, 19.9% and 15.7% TrackAP_50 on TAO validation and test sets.
arXiv Detail & Related papers (2022-06-05T07:51:58Z) - DSRRTracker: Dynamic Search Region Refinement for Attention-based
Siamese Multi-Object Tracking [13.104037155691644]
We propose an end-to-end MOT method, with a Gaussian filter-inspired dynamic search region refinement module.
Our method can achieve the state-of-the-art performance with reasonable speed.
arXiv Detail & Related papers (2022-03-21T04:14:06Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - Ensembling object detectors for image and video data analysis [98.26061123111647]
We propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data.
We extend it to video data by proposing a two-stage tracking-based scheme for detection refinement.
arXiv Detail & Related papers (2021-02-09T12:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.