Related papers: CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation

CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation

URL: http://arxiv.org/abs/2311.00987v1
Date: Thu, 2 Nov 2023 04:32:24 GMT
Title: CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation
Authors: Yiming Cui, Cheng Han, Dongfang Liu
Abstract summary: We propose a framework for instance-level visual analysis on video frames. It can simultaneously conduct object detection, instance segmentation, and multi-object tracking. We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets.
Score: 31.167405688707575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advancement of computer vision has pushed visual analysis tasks from still images to the video domain. In recent years, video instance segmentation, which aims to track and segment multiple objects in video frames, has drawn much attention for its potential applications in various emerging areas such as autonomous driving, intelligent transportation, and smart retail. In this paper, we propose an effective framework for instance-level visual analysis on video frames, which can simultaneously conduct object detection, instance segmentation, and multi-object tracking. The core idea of our method is collaborative multi-task learning which is achieved by a novel structure, named associative connections among detection, segmentation, and tracking task heads in an end-to-end learnable CNN. These additional connections allow information propagation across multiple related tasks, so as to benefit these tasks simultaneously. We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets and obtain quite encouraging results.

Related papers

VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT) Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens. We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z)
Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT) We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information. Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z)
Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets. Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z)
Deep Learning Techniques for Video Instance Segmentation: A Survey [19.32547752428875]
Video instance segmentation is an emerging computer vision research area introduced in 2019. Deep-learning techniques take a dominant role in various computer vision areas. This survey offers a multifaceted view of deep-learning schemes for video instance segmentation.
arXiv Detail & Related papers (2023-10-19T00:27:30Z)
Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model. To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z)
BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video [58.71785546245467]
Multiple existing benchmarks involve tracking and segmenting objects in video. There is little interaction between them due to the use of disparate benchmark datasets and metrics. We propose BURST, a dataset which contains thousands of diverse videos with high-quality object masks. All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison.
arXiv Detail & Related papers (2022-09-25T01:27:35Z)
Multi-target tracking for video surveillance using deep affinity network: a brief review [0.0]
Multi-target tracking (MTT) for video surveillance is one of the important and challenging tasks. Deep learning models are known to function like the human brain.
arXiv Detail & Related papers (2021-10-29T10:44:26Z)
A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications. Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.