D$^{\bf{3}}$: Duplicate Detection Decontaminator for Multi-Athlete
Tracking in Sports Videos
- URL: http://arxiv.org/abs/2209.12248v1
- Date: Sun, 25 Sep 2022 15:46:39 GMT
- Title: D$^{\bf{3}}$: Duplicate Detection Decontaminator for Multi-Athlete
Tracking in Sports Videos
- Authors: Rui He, Zehua Fu, Qingjie Liu, Yunhong Wang, Xunxun Chen
- Abstract summary: The duplicate detection is newly and precisely defined as occlusion misreporting on the same athlete by multiple detection boxes in one frame.
To address this problem, we meticulously design a novel transformer-based Detection Decontaminator (D$3$) for training, and a specific algorithm Rally-Hungarian (RH) for matching.
Our model, which is trained only with volleyball videos, can be applied directly to basketball and soccer videos for MAT.
- Score: 44.027619577289144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tracking multiple athletes in sports videos is a very challenging
Multi-Object Tracking (MOT) task, since athletes often have the same appearance
and are intimately covered with each other, making a common occlusion problem
becomes an abhorrent duplicate detection. In this paper, the duplicate
detection is newly and precisely defined as occlusion misreporting on the same
athlete by multiple detection boxes in one frame. To address this problem, we
meticulously design a novel transformer-based Duplicate Detection
Decontaminator (D$^3$) for training, and a specific algorithm Rally-Hungarian
(RH) for matching. Once duplicate detection occurs, D$^3$ immediately modifies
the procedure by generating enhanced boxes losses. RH, triggered by the team
sports substitution rules, is exceedingly suitable for sports videos. Moreover,
to complement the tracking dataset that without shot changes, we release a new
dataset based on sports video named RallyTrack. Extensive experiments on
RallyTrack show that combining D$^3$ and RH can dramatically improve the
tracking performance with 9.2 in MOTA and 4.5 in HOTA. Meanwhile, experiments
on MOT-series and DanceTrack discover that D$^3$ can accelerate convergence
during training, especially save up to 80 percent of the original training time
on MOT17. Finally, our model, which is trained only with volleyball videos, can
be applied directly to basketball and soccer videos for MAT, which shows
priority of our method. Our dataset is available at
https://github.com/heruihr/rallytrack.
Related papers
- Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos [11.35998213546475]
Multi-object tracking (MOT) is a critical and challenging task in computer vision.
We introduce TeamTrack, a pioneering benchmark dataset specifically designed for MOT in sports.
TeamTrack is an extensive collection of full-pitch video data from various sports, including soccer, basketball, and handball.
arXiv Detail & Related papers (2024-04-22T04:33:40Z) - SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports
Scenes [44.46768991505495]
We present a new large-scale multi-object tracking dataset in diverse sports scenes, coined as emphSportsMOT.
It consists of 240 video sequences, over 150K frames and over 1.6M bounding boxes collected from 3 sports categories, including basketball, volleyball and football.
We propose a new multi-object tracking framework, termed as emphMixSort, introducing a MixFormer-like structure as an auxiliary association model to prevailing tracking-by-detection trackers.
arXiv Detail & Related papers (2023-04-11T12:07:31Z) - ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every
Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames.
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes.
In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos [64.57435509822416]
This work consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads.
We formulate two sets of action detection problems -- emphaction localization and emphaction recognition.
The results confirm that TheName is still a challenging task and can be used as a special benchmark for dense action detection from videos.
arXiv Detail & Related papers (2022-07-26T08:34:17Z) - TDT: Teaching Detectors to Track without Fully Annotated Videos [2.8292841621378844]
One-stage trackers that predict both detections and appearance embeddings in one forward pass received much attention.
Our proposed one-stage solution matches the two-stage counterpart in quality but is 3 times faster.
arXiv Detail & Related papers (2022-05-11T15:56:17Z) - SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in
Soccer Videos [62.686484228479095]
We propose a novel dataset for multiple object tracking composed of 200 sequences of 30s each.
The dataset is fully annotated with bounding boxes and tracklet IDs.
Our analysis shows that multiple player, referee and ball tracking in soccer videos is far from being solved.
arXiv Detail & Related papers (2022-04-14T12:22:12Z) - Event detection in coarsely annotated sports videos via parallel multi
receptive field 1D convolutions [14.30009544149561]
In problems such as sports video analytics, it is difficult to obtain accurate frame level annotations and exact event duration.
We propose the task of event detection in coarsely annotated videos.
We introduce a multi-tower temporal convolutional network architecture for the proposed task.
arXiv Detail & Related papers (2020-04-13T19:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.