GMT: A Robust Global Association Model for Multi-Target Multi-Camera Tracking
- URL: http://arxiv.org/abs/2407.01007v1
- Date: Mon, 1 Jul 2024 06:39:14 GMT
- Title: GMT: A Robust Global Association Model for Multi-Target Multi-Camera Tracking
- Authors: Huijie Fan, Tinghui Zhao, Qiang Wang, Baojie Fan, Yandong Tang, LianQing Liu,
- Abstract summary: We propose a global online MTMC tracking model that addresses the dependency on the first tracking stage in two-step methods and enhances cross-camera matching.
Specifically, we propose a transformer-based global MTMC association module to explore target associations across different cameras and frames.
To accommodate high scene diversity and complex lighting condition variations, we have established the VisionTrack dataset.
- Score: 13.305411087116635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the task of multi-target multi-camera (MTMC) tracking of pedestrians, the data association problem is a key issue and main challenge, especially with complications arising from camera movements, lighting variations, and obstructions. However, most MTMC models adopt two-step approaches, thus heavily depending on the results of the first-step tracking in practical applications. Moreover, the same targets crossing different cameras may exhibit significant appearance variations, which further increases the difficulty of cross-camera matching. To address the aforementioned issues, we propose a global online MTMC tracking model that addresses the dependency on the first tracking stage in two-step methods and enhances cross-camera matching. Specifically, we propose a transformer-based global MTMC association module to explore target associations across different cameras and frames, generating global trajectories directly. Additionally, to integrate the appearance and spatio-temporal features of targets, we propose a feature extraction and fusion module for MTMC tracking. This module enhances feature representation and establishes correlations between the features of targets across multiple cameras. To accommodate high scene diversity and complex lighting condition variations, we have established the VisionTrack dataset, which enables the development of models that are more generalized and robust to various environments. Our model demonstrates significant improvements over comparison methods on the VisionTrack dataset and others.
Related papers
- MCTR: Multi Camera Tracking Transformer [45.66952089591361]
Multi-Camera Tracking tRansformer (MCTR) is a novel end-to-end approach tailored for multi-object detection and tracking across multiple cameras.
MCTR leverages end-to-end detectors like DEtector TRansformer (DETR) to produce detections and detection embeddings independently for each camera view.
The framework maintains set of track embeddings that encaplusate global information about the tracked objects, and updates them at every frame by integrating local information from the view-specific detection embeddings.
arXiv Detail & Related papers (2024-08-23T17:37:03Z) - MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark [63.878793340338035]
Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras.
Existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting.
We present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments.
arXiv Detail & Related papers (2024-03-29T15:08:37Z) - Towards Effective Multi-Moving-Camera Tracking: A New Dataset and Lightweight Link Model [4.581852145863394]
Multi-target multi-camera (MTMC) tracking systems are composed of two modules: single-camera tracking (SCT) and inter-camera tracking (ICT)
MTMC tracking has been a very complicated task, while tracking across multiple moving cameras makes it even more challenging.
Linker is proposed to mitigate the identity switch by associating two disjoint tracklets of the same target into a complete trajectory within the same camera.
arXiv Detail & Related papers (2023-12-18T09:11:28Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate
Multi-Camera Multiple Object Tracking [25.98400206361454]
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications.
This work proposes a new Dynamic Graph Model with Link Prediction approach to solve the data association task.
Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets.
arXiv Detail & Related papers (2021-06-12T20:22:30Z) - Know Your Surroundings: Panoramic Multi-Object Tracking by Multimodality
Collaboration [56.01625477187448]
We propose a MultiModality PAnoramic multi-object Tracking framework (MMPAT)
It takes both 2D panorama images and 3D point clouds as input and then infers target trajectories using the multimodality data.
We evaluate the proposed method on the JRDB dataset, where the MMPAT achieves the top performance in both the detection and tracking tasks.
arXiv Detail & Related papers (2021-05-31T03:16:38Z) - Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision.
We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes.
With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.