QueryInst: Parallelly Supervised Mask Query for Instance Segmentation
- URL: http://arxiv.org/abs/2105.01928v1
- Date: Wed, 5 May 2021 08:38:25 GMT
- Title: QueryInst: Parallelly Supervised Mask Query for Instance Segmentation
- Authors: Yuxin Fang, Shusheng Yang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan,
Bin Feng, Wenyu Liu
- Abstract summary: We present QueryInst, a query based instance segmentation method driven by parallel supervision on dynamic mask heads.
We conduct extensive experiments on three challenging benchmarks, i.e., COCO, CityScapes, and YouTube-VIS.
QueryInst achieves the best performance among all online VIS approaches and strikes a decent speed-accuracy trade-off.
- Score: 53.5613957875507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, query based object detection frameworks achieve comparable
performance with previous state-of-the-art object detectors. However, how to
fully leverage such frameworks to perform instance segmentation remains an open
problem. In this paper, we present QueryInst, a query based instance
segmentation method driven by parallel supervision on dynamic mask heads. The
key insight of QueryInst is to leverage the intrinsic one-to-one correspondence
in object queries across different stages, as well as one-to-one correspondence
between mask RoI features and object queries in the same stage. This approach
eliminates the explicit multi-stage mask head connection and the proposal
distribution inconsistency issues inherent in non-query based multi-stage
instance segmentation methods. We conduct extensive experiments on three
challenging benchmarks, i.e., COCO, CityScapes, and YouTube-VIS to evaluate the
effectiveness of QueryInst in instance segmentation and video instance
segmentation (VIS) task. Specifically, using ResNet-101-FPN backbone, QueryInst
obtains 48.1 box AP and 42.8 mask AP on COCO test-dev, which is 2 points higher
than HTC in terms of both box AP and mask AP, while runs 2.4 times faster. For
video instance segmentation, QueryInst achieves the best performance among all
online VIS approaches and strikes a decent speed-accuracy trade-off. Code is
available at \url{https://github.com/hustvl/QueryInst}.
Related papers
- DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries [14.435906383301555]
We propose a novel framework dubbed DQFormer to implement semantic and instance segmentation in a unified workflow.
Specifically, we design a decoupled query generator to propose informative queries with semantics by localizing things/stuff positions.
We also introduce a query-oriented mask decoder to decode corresponding segmentation masks.
arXiv Detail & Related papers (2024-08-28T14:14:33Z) - A Unified Query-based Paradigm for Camouflaged Instance Segmentation [26.91533966120182]
We propose a unified query-based multi-task learning framework for camouflaged instance segmentation, termed UQFormer.
Our model views the instance segmentation as a query-based direct set prediction problem, without other post-processing such as non-maximal suppression.
Compared with 14 state-of-the-art approaches, our UQFormer significantly improves the performance of camouflaged instance segmentation.
arXiv Detail & Related papers (2023-08-14T18:23:18Z) - Learning Equivariant Segmentation with Instance-Unique Querying [47.52528819153683]
We devise a new training framework that boosts query-based models through discriminative query embedding learning.
Our algorithm uses the queries to retrieve the corresponding instances from the whole training dataset.
On top of four famous, query-based models, our training algorithm provides significant performance gains.
arXiv Detail & Related papers (2022-10-03T13:14:00Z) - BURST: A Benchmark for Unifying Object Recognition, Segmentation and
Tracking in Video [58.71785546245467]
Multiple existing benchmarks involve tracking and segmenting objects in video.
There is little interaction between them due to the use of disparate benchmark datasets and metrics.
We propose BURST, a dataset which contains thousands of diverse videos with high-quality object masks.
All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison.
arXiv Detail & Related papers (2022-09-25T01:27:35Z) - Mask Encoding for Single Shot Instance Segmentation [97.99956029224622]
We propose a simple singleshot instance segmentation framework, termed mask encoding based instance segmentation (MEInst)
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector.
We show that the much simpler and flexible one-stage instance segmentation method, can also achieve competitive performance.
arXiv Detail & Related papers (2020-03-26T02:51:17Z) - SOLOv2: Dynamic and Fast Instance Segmentation [102.15325936477362]
We build a simple, direct, and fast instance segmentation framework with strong performance.
We take one step further by dynamically learning the mask head of the object segmenter.
We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy.
arXiv Detail & Related papers (2020-03-23T09:44:21Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.