SOLQ: Segmenting Objects by Learning Queries
- URL: http://arxiv.org/abs/2106.02351v1
- Date: Fri, 4 Jun 2021 09:03:31 GMT
- Title: SOLQ: Segmenting Objects by Learning Queries
- Authors: Bin Dong, Fangao Zeng, Tiancai Wang, Xiangyu Zhang, Yichen Wei
- Abstract summary: In SOLQ, each query represents one object and has multiple representations: class, location and mask.
SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches.
Joint learning of unified query representation can greatly improve the detection performance of original DETR.
- Score: 33.02115826341877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose an end-to-end framework for instance segmentation.
Based on the recently introduced DETR [1], our method, termed SOLQ, segments
objects by learning unified queries. In SOLQ, each query represents one object
and has multiple representations: class, location and mask. The object queries
learned perform classification, box regression and mask encoding simultaneously
in an unified vector form. During training phase, the mask vectors encoded are
supervised by the compression coding of raw spatial masks. In inference time,
mask vectors produced can be directly transformed to spatial masks by the
inverse process of compression coding. Experimental results show that SOLQ can
achieve state-of-the-art performance, surpassing most of existing approaches.
Moreover, the joint learning of unified query representation can greatly
improve the detection performance of original DETR. We hope our SOLQ can serve
as a strong baseline for the Transformer-based instance segmentation. Code is
available at https://github.com/megvii-research/SOLQ.
Related papers
- Shift and matching queries for video semantic segmentation [0.0]
We propose a method to extend a query-based image segmentation model to video.
The method uses a query-based architecture, where decoded queries represent segmentation masks.
Experimental results on CityScapes-VPS and VSPW show significant improvements from the baselines.
arXiv Detail & Related papers (2024-10-10T06:07:33Z) - DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries [14.435906383301555]
We propose a novel framework dubbed DQFormer to implement semantic and instance segmentation in a unified workflow.
Specifically, we design a decoupled query generator to propose informative queries with semantics by localizing things/stuff positions.
We also introduce a query-oriented mask decoder to decode corresponding segmentation masks.
arXiv Detail & Related papers (2024-08-28T14:14:33Z) - Temporal-aware Hierarchical Mask Classification for Video Semantic
Segmentation [62.275143240798236]
Video semantic segmentation dataset has limited categories per video.
Less than 10% of queries could be matched to receive meaningful gradient updates during VSS training.
Our method achieves state-of-the-art performance on the latest challenging VSS benchmark VSPW without bells and whistles.
arXiv Detail & Related papers (2023-09-14T20:31:06Z) - A Unified Query-based Paradigm for Camouflaged Instance Segmentation [26.91533966120182]
We propose a unified query-based multi-task learning framework for camouflaged instance segmentation, termed UQFormer.
Our model views the instance segmentation as a query-based direct set prediction problem, without other post-processing such as non-maximal suppression.
Compared with 14 state-of-the-art approaches, our UQFormer significantly improves the performance of camouflaged instance segmentation.
arXiv Detail & Related papers (2023-08-14T18:23:18Z) - Mask Matching Transformer for Few-Shot Segmentation [71.32725963630837]
Mask Matching Transformer (MM-Former) is a new paradigm for the few-shot segmentation task.
First, the MM-Former follows the paradigm of decompose first and then blend, allowing our method to benefit from the advanced potential objects segmenter.
We conduct extensive experiments on the popular COCO-$20i$ and Pascal-$5i$ benchmarks.
arXiv Detail & Related papers (2022-12-05T11:00:32Z) - SOIT: Segmenting Objects with Instance-Aware Transformers [16.234574932216855]
This paper presents an end-to-end instance segmentation framework, termed SOIT, that Segments Objects with Instance-aware Transformers.
Inspired by DETR citecarion 2020end, our method views instance segmentation as a direct set prediction problem.
Experimental results on the MS COCO dataset demonstrate that SOIT outperforms state-of-the-art instance segmentation approaches significantly.
arXiv Detail & Related papers (2021-12-21T08:23:22Z) - QueryInst: Parallelly Supervised Mask Query for Instance Segmentation [53.5613957875507]
We present QueryInst, a query based instance segmentation method driven by parallel supervision on dynamic mask heads.
We conduct extensive experiments on three challenging benchmarks, i.e., COCO, CityScapes, and YouTube-VIS.
QueryInst achieves the best performance among all online VIS approaches and strikes a decent speed-accuracy trade-off.
arXiv Detail & Related papers (2021-05-05T08:38:25Z) - DCT-Mask: Discrete Cosine Transform Mask Representation for Instance
Segmentation [50.70679435176346]
We propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector.
Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods.
arXiv Detail & Related papers (2020-11-19T15:00:21Z) - Mask Encoding for Single Shot Instance Segmentation [97.99956029224622]
We propose a simple singleshot instance segmentation framework, termed mask encoding based instance segmentation (MEInst)
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector.
We show that the much simpler and flexible one-stage instance segmentation method, can also achieve competitive performance.
arXiv Detail & Related papers (2020-03-26T02:51:17Z) - SOLOv2: Dynamic and Fast Instance Segmentation [102.15325936477362]
We build a simple, direct, and fast instance segmentation framework with strong performance.
We take one step further by dynamically learning the mask head of the object segmenter.
We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy.
arXiv Detail & Related papers (2020-03-23T09:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.