Masks and Boxes: Combining the Best of Both Worlds for Multi-Object Tracking
- URL: http://arxiv.org/abs/2409.14220v2
- Date: Thu, 26 Sep 2024 08:13:43 GMT
- Title: Masks and Boxes: Combining the Best of Both Worlds for Multi-Object Tracking
- Authors: Tomasz Stanczyk, Francois Bremond,
- Abstract summary: Multi-object tracking (MOT) involves identifying and consistently tracking objects across video sequences.
Traditional tracking-by-detection methods require extensive tuning and lack generalizability.
We propose a novel approach, McByte, which incorporates a temporally propagated segmentation mask as a strong association cue.
- Score: 1.2145800134384477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-object tracking (MOT) involves identifying and consistently tracking objects across video sequences. Traditional tracking-by-detection methods, while effective, often require extensive tuning and lack generalizability. On the other hand, segmentation mask-based methods are more generic but struggle with tracking management, making them unsuitable for MOT. We propose a novel approach, McByte, which incorporates a temporally propagated segmentation mask as a strong association cue within a tracking-by-detection framework. By combining bounding box and mask information, McByte enhances robustness and generalizability without per-sequence tuning. Evaluated on four benchmark datasets - DanceTrack, MOT17, SoccerNet-tracking 2022, and KITTI-tracking - McByte demonstrates performance gain in all cases examined. At the same time, it outperforms existing mask-based methods. Implementation code will be provided upon acceptance.
Related papers
- No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond [1.0806835533814036]
Multi-object tracking (MOT) is essential for sports analytics, enabling performance evaluation and tactical insights.<n>Traditional tracking-by-detection methods require extensive tuning, while segmentation-based approaches struggle with track processing.<n>We propose McByte, a tracking-by-detection framework that integrates temporally propagated segmentation mask as an association cue to improve robustness without per-video tuning.
arXiv Detail & Related papers (2025-06-02T07:00:15Z) - SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation [11.1906749425206]
Segment Anything 2 (SAM2) enables robust single-object tracking using segmentation.
We propose SAM2MOT, a novel Tracking by paradigm for multi-object tracking.
SAM2MOT directly generates tracking boxes from segmentation masks, reducing reliance on detection accuracy.
arXiv Detail & Related papers (2025-04-06T15:32:08Z) - ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association [15.161640917854363]
We introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras.
We introduce a learnable data association module based on edge-augmented cross-attention.
We integrate this association module into the decoder layer of a DETR-based 3D detector.
arXiv Detail & Related papers (2024-05-14T19:02:33Z) - Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual
Tracking and Segmentation [37.85026590250023]
This paper proposes a Multi-object Mask-box Integrated framework for unified Tracking and representation.
A novel pinpoint box predictor is proposed for accurate multi-object box prediction.
MITS achieves state-of-the-art performance on both Visual Object Tracking (VOT) and Video Object Tracking (VOS) benchmarks.
arXiv Detail & Related papers (2023-08-25T09:37:51Z) - Tracking Anything in High Quality [63.63653185865726]
HQTrack is a framework for High Quality Tracking anything in videos.
It consists of a video multi-object segmenter (VMOS) and a mask refiner (MR)
arXiv Detail & Related papers (2023-07-26T06:19:46Z) - DIVOTrack: A Novel Dataset and Baseline Method for Cross-View
Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available.
Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Multi-Object Tracking and Segmentation via Neural Message Passing [0.0]
Graphs offer a natural way to formulate Multiple Object Tracking (MOT) and Multiple Object Tracking and (MOTS)
We exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs)
We achieve state-of-the-art results for both tracking and segmentation in several publicly available datasets.
arXiv Detail & Related papers (2022-07-15T13:03:47Z) - Robust Visual Tracking by Segmentation [103.87369380021441]
Estimating the target extent poses a fundamental challenge in visual object tracking.
We propose a segmentation-centric tracking pipeline that produces a highly accurate segmentation mask.
Our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content.
arXiv Detail & Related papers (2022-03-21T17:59:19Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm.
We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z) - IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency [40.354708148590696]
"instance-aware MOT" (IA-MOT) can track multiple objects in either static or moving cameras.
Our proposed method won the first place in Track 3 of the BMTT Challenge in CVPR 2020 workshops.
arXiv Detail & Related papers (2020-06-24T03:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.