CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking
and Segmentation
- URL: http://arxiv.org/abs/2311.00987v1
- Date: Thu, 2 Nov 2023 04:32:24 GMT
- Title: CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking
and Segmentation
- Authors: Yiming Cui, Cheng Han, Dongfang Liu
- Abstract summary: We propose a framework for instance-level visual analysis on video frames.
It can simultaneously conduct object detection, instance segmentation, and multi-object tracking.
We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets.
- Score: 31.167405688707575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of computer vision has pushed visual analysis tasks from
still images to the video domain. In recent years, video instance segmentation,
which aims to track and segment multiple objects in video frames, has drawn
much attention for its potential applications in various emerging areas such as
autonomous driving, intelligent transportation, and smart retail. In this
paper, we propose an effective framework for instance-level visual analysis on
video frames, which can simultaneously conduct object detection, instance
segmentation, and multi-object tracking. The core idea of our method is
collaborative multi-task learning which is achieved by a novel structure, named
associative connections among detection, segmentation, and tracking task heads
in an end-to-end learnable CNN. These additional connections allow information
propagation across multiple related tasks, so as to benefit these tasks
simultaneously. We evaluate the proposed method extensively on KITTI MOTS and
MOTS Challenge datasets and obtain quite encouraging results.
Related papers
- VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - Deep Learning Techniques for Video Instance Segmentation: A Survey [19.32547752428875]
Video instance segmentation is an emerging computer vision research area introduced in 2019.
Deep-learning techniques take a dominant role in various computer vision areas.
This survey offers a multifaceted view of deep-learning schemes for video instance segmentation.
arXiv Detail & Related papers (2023-10-19T00:27:30Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - BURST: A Benchmark for Unifying Object Recognition, Segmentation and
Tracking in Video [58.71785546245467]
Multiple existing benchmarks involve tracking and segmenting objects in video.
There is little interaction between them due to the use of disparate benchmark datasets and metrics.
We propose BURST, a dataset which contains thousands of diverse videos with high-quality object masks.
All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison.
arXiv Detail & Related papers (2022-09-25T01:27:35Z) - Multi-target tracking for video surveillance using deep affinity
network: a brief review [0.0]
Multi-target tracking (MTT) for video surveillance is one of the important and challenging tasks.
Deep learning models are known to function like the human brain.
arXiv Detail & Related papers (2021-10-29T10:44:26Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.