MOPT: Multi-Object Panoptic Tracking
- URL: http://arxiv.org/abs/2004.08189v2
- Date: Wed, 27 May 2020 14:57:01 GMT
- Title: MOPT: Multi-Object Panoptic Tracking
- Authors: Juana Valeria Hurtado, Rohit Mohan, Wolfram Burgard, Abhinav Valada
- Abstract summary: We introduce a novel perception task denoted as multi-object panoptic tracking (MOPT)
MOPT allows for exploiting pixel-level semantic information of 'thing' and'stuff' classes, temporal coherence, and pixel-level associations over time.
We present extensive quantitative and qualitative evaluations of both vision-based and LiDAR-based MOPT that demonstrate encouraging results.
- Score: 33.77171216778909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Comprehensive understanding of dynamic scenes is a critical prerequisite for
intelligent robots to autonomously operate in their environment. Research in
this domain, which encompasses diverse perception problems, has primarily been
focused on addressing specific tasks individually rather than modeling the
ability to understand dynamic scenes holistically. In this paper, we introduce
a novel perception task denoted as multi-object panoptic tracking (MOPT), which
unifies the conventionally disjoint tasks of semantic segmentation, instance
segmentation, and multi-object tracking. MOPT allows for exploiting pixel-level
semantic information of 'thing' and 'stuff' classes, temporal coherence, and
pixel-level associations over time, for the mutual benefit of each of the
individual sub-problems. To facilitate quantitative evaluations of MOPT in a
unified manner, we propose the soft panoptic tracking quality (sPTQ) metric. As
a first step towards addressing this task, we propose the novel
PanopticTrackNet architecture that builds upon the state-of-the-art top-down
panoptic segmentation network EfficientPS by adding a new tracking head to
simultaneously learn all sub-tasks in an end-to-end manner. Additionally, we
present several strong baselines that combine predictions from state-of-the-art
panoptic segmentation and multi-object tracking models for comparison. We
present extensive quantitative and qualitative evaluations of both vision-based
and LiDAR-based MOPT that demonstrate encouraging results.
Related papers
- IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking [13.977088329815933]
Multi-Object Tracking (MOT) aims to associate multiple objects across video frames.
Most existing approaches train and track within a single domain, resulting in a lack of cross-domain generalizability.
We develop IP-MOT, an end-to-end transformer model for MOT that operates without concrete textual descriptions.
arXiv Detail & Related papers (2024-10-30T14:24:56Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking
and Segmentation [31.167405688707575]
We propose a framework for instance-level visual analysis on video frames.
It can simultaneously conduct object detection, instance segmentation, and multi-object tracking.
We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets.
arXiv Detail & Related papers (2023-11-02T04:32:24Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - Position-Aware Contrastive Alignment for Referring Image Segmentation [65.16214741785633]
We present a position-aware contrastive alignment network (PCAN) to enhance the alignment of multi-modal features.
Our PCAN consists of two modules: 1) Position Aware Module (PAM), which provides position information of all objects related to natural language descriptions, and 2) Contrastive Language Understanding Module (CLUM), which enhances multi-modal alignment.
arXiv Detail & Related papers (2022-12-27T09:13:19Z) - Multi-target tracking for video surveillance using deep affinity
network: a brief review [0.0]
Multi-target tracking (MTT) for video surveillance is one of the important and challenging tasks.
Deep learning models are known to function like the human brain.
arXiv Detail & Related papers (2021-10-29T10:44:26Z) - Weakly Supervised Multi-Object Tracking and Segmentation [21.7184457265122]
We introduce the problem of weakly supervised Multi-Object Tracking and, i.e. joint weakly supervised instance segmentation and multi-object tracking.
To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning.
We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7% for cars and pedestrians, respectively.
arXiv Detail & Related papers (2021-01-03T17:06:43Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.