Related papers: Glance-MCMT: A General MCMT Framework with Glance Initialization and Progressive Association

Glance-MCMT: A General MCMT Framework with Glance Initialization and Progressive Association

URL: http://arxiv.org/abs/2507.10115v1
Date: Mon, 14 Jul 2025 09:57:53 GMT
Title: Glance-MCMT: A General MCMT Framework with Glance Initialization and Progressive Association
Authors: Hamidreza Hashempoor,
Abstract summary: We propose a multi-camera multi-target (MCMT) tracking framework that ensures consistent global identity assignment across views.<n>The pipeline starts with BoT-SORT-based single-camera tracking, followed by an initial glance phase to initialize global IDs.<n>New global IDs are only introduced when no sufficiently similar trajectory or feature match is found.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a multi-camera multi-target (MCMT) tracking framework that ensures consistent global identity assignment across views using trajectory and appearance cues. The pipeline starts with BoT-SORT-based single-camera tracking, followed by an initial glance phase to initialize global IDs via trajectory-feature matching. In later frames, new tracklets are matched to existing global identities through a prioritized global matching strategy. New global IDs are only introduced when no sufficiently similar trajectory or feature match is found. 3D positions are estimated using depth maps and calibration for spatial validation.

Related papers

VISTA: Monocular Segmentation-Based Mapping for Appearance and View-Invariant Global Localization [0.2356141385409842]
VISTA is a novel open-set, monocular global localization framework.<n>It exploits geometric consistencies between environment maps to align reference frames.<n>We evaluate VISTA on seasonal and oblique-angle aerial datasets, achieving up to a 69% improvement in recall over baseline methods.
arXiv Detail & Related papers (2025-07-15T18:38:35Z)
Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID [82.12123628480371]
Unsupervised person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning.<n>Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning.<n>We propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up objective for specific fine-grained patterns emphasized by each modality.
arXiv Detail & Related papers (2025-04-27T13:58:12Z)
Incremental Multiview Point Cloud Registration [18.830104930321223]
We propose an incremental pipeline to progressively align scans into a canonical coordinate system. For detector-free matchers, we incorporate a Track refinement process. Experiments demonstrate that the proposed framework outperforms existing multiview registration methods on three benchmark datasets.
arXiv Detail & Related papers (2024-07-06T09:28:23Z)
Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking [14.47355191520578]
Point cloud-based 3D object tracking is an important task in autonomous driving. It remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. We present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage.
arXiv Detail & Related papers (2023-12-18T09:33:49Z)
CP-SLAM: Collaborative Neural Point-based SLAM System [54.916578456416204]
This paper presents a collaborative implicit neural localization and mapping (SLAM) system with RGB-D image sequences. In order to enable all these modules in a unified framework, we propose a novel neural point based 3D scene representation. A distributed-to-centralized learning strategy is proposed for the collaborative implicit SLAM to improve consistency and cooperation.
arXiv Detail & Related papers (2023-11-14T09:17:15Z)
M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection [22.60675416709486]
M$3$Net is an attention network for Salient Object Detection. Cross-attention approach to achieve the interaction between multilevel features. Mixed Attention Block aims at modeling context at both global and local levels. Multilevel supervision strategy to optimize the aggregated feature stage-by-stage.
arXiv Detail & Related papers (2023-09-15T12:46:14Z)
End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time. Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z)
Improving tracking with a tracklet associator [17.839783649372116]
Multiple object tracking (MOT) is a task in computer vision that aims to detect the position of objects in videos and to associate them to a unique identity. We propose an approach based on Constraint Programming (CP) whose goal is to be grafted to any existing tracker in order to improve its object association results.
arXiv Detail & Related papers (2022-04-22T12:47:46Z)
Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z)
Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking [53.668757725179056]
We propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT. Instead of trying to deal with all appearance changes, we tailor the affinity metric to specialize in ones that might emerge during data associations. Minimizing the mismatch, the adaptive affinity module brings significant improvements over global re-ID distance.
arXiv Detail & Related papers (2021-12-14T18:59:11Z)
Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision. We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes. With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.