The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group
Dance Challenge
- URL: http://arxiv.org/abs/2210.15281v1
- Date: Thu, 27 Oct 2022 09:28:44 GMT
- Title: The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group
Dance Challenge
- Authors: Yuang Zhang and Tiancai Wang and Weiyao Lin and Xiangyu Zhang
- Abstract summary: We present our 1st place solution to the Group Dance Multiple People Tracking Challenge.
Based on MOTR: End-to-End Multiple-Object Tracking with Transformer, we explore: 1) detect queries as anchors, 2) tracking as query denoising, and 3) joint training on pseudo video clips generated from CrowdHuman dataset.
Our method achieves 73.4% HOTA on the DanceTrack test set, surpassing the second-place solution by +6.8% HOTA.
- Score: 28.79662033029203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present our 1st place solution to the Group Dance Multiple People Tracking
Challenge. Based on MOTR: End-to-End Multiple-Object Tracking with Transformer,
we explore: 1) detect queries as anchors, 2) tracking as query denoising, 3)
joint training on pseudo video clips generated from CrowdHuman dataset, and 4)
using the YOLOX detection proposals for the anchor initialization of detect
queries. Our method achieves 73.4% HOTA on the DanceTrack test set, surpassing
the second-place solution by +6.8% HOTA.
Related papers
- Multi-object Tracking by Detection and Query: an efficient end-to-end manner [23.926668750263488]
Multi-object tracking is advancing through two dominant paradigms: traditional tracking by detection and newly emerging tracking by query.
We propose the tracking-by-detection-and-query paradigm, which is achieved by a Learnable Associator.
Compared to tracking-by-query models, LAID achieves competitive tracking accuracy with notably higher training efficiency.
arXiv Detail & Related papers (2024-11-09T14:38:08Z) - 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [81.50620771207329]
We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS)
Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
arXiv Detail & Related papers (2024-06-11T08:05:26Z) - Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023 [50.910598799408326]
The Tracking Any Point (TAP) task tracks any physical surface through a video.
Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories.
We propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera.
arXiv Detail & Related papers (2024-03-26T13:50:39Z) - Multiple Object Tracking Challenge Technical Report for Team MT_IoT [41.88133094982688]
We treat the MOT task as a two-stage task including human detection and trajectory matching.
Specifically, we designed an improved human detector and associated most of detection to guarantee the integrity of the motion trajectory.
Without any model merging, our method achieves 66.672 HOTA and 93.971 MOTA on the DanceTrack challenge dataset.
arXiv Detail & Related papers (2022-12-07T12:00:51Z) - The Second-place Solution for ECCV 2022 Multiple People Tracking in
Group Dance Challenge [6.388173902438571]
method mainly includes two steps: online short-term tracking using our Cascaded Buffer-IoU (C-BIoU) Tracker, and, offline long-term tracking using appearance feature and hierarchical clustering.
Our C-BIoU tracker adds buffers to expand the matching space of detections and tracks.
After using our C-BIoU for online tracking, we applied the offline refinement introduced by ReMOTS.
arXiv Detail & Related papers (2022-11-24T10:04:09Z) - 1st Place Solutions for the UVO Challenge 2022 [26.625850534861414]
The method ranks first on the 2nd Unidentified Video Objects (UVO) challenge, achieving AR@100 of 46.8, 64.7 and 32.2 in the limited data frame track, unlimited data frame track and video track respectively.
arXiv Detail & Related papers (2022-10-18T06:54:37Z) - D$^{\bf{3}}$: Duplicate Detection Decontaminator for Multi-Athlete
Tracking in Sports Videos [44.027619577289144]
The duplicate detection is newly and precisely defined as occlusion misreporting on the same athlete by multiple detection boxes in one frame.
To address this problem, we meticulously design a novel transformer-based Detection Decontaminator (D$3$) for training, and a specific algorithm Rally-Hungarian (RH) for matching.
Our model, which is trained only with volleyball videos, can be applied directly to basketball and soccer videos for MAT.
arXiv Detail & Related papers (2022-09-25T15:46:39Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Chained-Tracker: Chaining Paired Attentive Regression Results for
End-to-End Joint Multiple-Object Detection and Tracking [102.31092931373232]
We propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution.
The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective.
arXiv Detail & Related papers (2020-07-29T02:38:49Z) - Tracking by Instance Detection: A Meta-Learning Approach [99.66119903655711]
We propose a principled three-step approach to build a high-performance tracker.
We build two trackers, named Retina-MAML and FCOS-MAML, based on two modern detectors RetinaNet and FCOS.
Both trackers run in real-time at 40 FPS.
arXiv Detail & Related papers (2020-04-02T05:55:06Z) - Detection in Crowded Scenes: One Proposal, Multiple Predictions [79.28850977968833]
We propose a proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.
The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks.
Our detector can obtain 4.9% AP gains on challenging CrowdHuman dataset and 1.0% $textMR-2$ improvements on CityPersons dataset.
arXiv Detail & Related papers (2020-03-20T09:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.