SOIT: Segmenting Objects with Instance-Aware Transformers
- URL: http://arxiv.org/abs/2112.11037v2
- Date: Thu, 23 Dec 2021 15:28:12 GMT
- Title: SOIT: Segmenting Objects with Instance-Aware Transformers
- Authors: Xiaodong Yu, Dahu Shi, Xing Wei, Ye Ren, Tingqun Ye, Wenming Tan
- Abstract summary: This paper presents an end-to-end instance segmentation framework, termed SOIT, that Segments Objects with Instance-aware Transformers.
Inspired by DETR citecarion 2020end, our method views instance segmentation as a direct set prediction problem.
Experimental results on the MS COCO dataset demonstrate that SOIT outperforms state-of-the-art instance segmentation approaches significantly.
- Score: 16.234574932216855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an end-to-end instance segmentation framework, termed
SOIT, that Segments Objects with Instance-aware Transformers. Inspired by DETR
\cite{carion2020end}, our method views instance segmentation as a direct set
prediction problem and effectively removes the need for many hand-crafted
components like RoI cropping, one-to-many label assignment, and non-maximum
suppression (NMS). In SOIT, multiple queries are learned to directly reason a
set of object embeddings of semantic category, bounding-box location, and
pixel-wise mask in parallel under the global image context. The class and
bounding-box can be easily embedded by a fixed-length vector. The pixel-wise
mask, especially, is embedded by a group of parameters to construct a
lightweight instance-aware transformer. Afterward, a full-resolution mask is
produced by the instance-aware transformer without involving any RoI-based
operation. Overall, SOIT introduces a simple single-stage instance segmentation
framework that is both RoI- and NMS-free. Experimental results on the MS COCO
dataset demonstrate that SOIT outperforms state-of-the-art instance
segmentation approaches significantly. Moreover, the joint learning of multiple
tasks in a unified query embedding can also substantially improve the detection
performance. Code is available at \url{https://github.com/yuxiaodongHRI/SOIT}.
Related papers
- Instance-Aware Generalized Referring Expression Segmentation [32.96760407482406]
InstAlign is a method that incorporates object-level reasoning into the segmentation process.
Our method significantly advances state-of-the-art performance, setting a new standard for precise and flexible GRES.
arXiv Detail & Related papers (2024-11-22T17:28:43Z) - Matching Anything by Segmenting Anything [109.2507425045143]
We propose MASA, a novel method for robust instance association learning.
MASA learns instance-level correspondence through exhaustive data transformations.
We show that MASA achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences.
arXiv Detail & Related papers (2024-06-06T16:20:07Z) - SKU-Patch: Towards Efficient Instance Segmentation for Unseen Objects in
Auto-Store [102.45729472142526]
In large-scale storehouses, precise instance masks are crucial for robotic bin picking.
This paper presents a new patch-guided instance segmentation solution, leveraging only a few image patches for each incoming new SKU.
SKU-Patch yields an average of nearly 100% grasping success rate on more than 50 unseen SKUs in a robot-aided auto-store logistic pipeline.
arXiv Detail & Related papers (2023-11-08T12:44:38Z) - SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance
Segmentation [22.930296667684125]
We propose a new box-supervised instance segmentation approach by developing a Semantic-aware Instance Mask (SIM) generation paradigm.
Considering that the semantic-aware prototypes cannot distinguish different instances of the same semantics, we propose a self-correction mechanism.
Extensive experimental results demonstrate the superiority of our proposed SIM approach over other state-of-the-art methods.
arXiv Detail & Related papers (2023-03-14T05:59:25Z) - Mean Shift Mask Transformer for Unseen Object Instance Segmentation [12.371855276852195]
Mean Shift Mask Transformer (MSMFormer) is a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm.
Our experiments show that MSMFormer achieves competitive performance compared to state-of-the-art methods for unseen object instance segmentation.
arXiv Detail & Related papers (2022-11-21T17:47:48Z) - Semantic Attention and Scale Complementary Network for Instance
Segmentation in Remote Sensing Images [54.08240004593062]
We propose an end-to-end multi-category instance segmentation model, which consists of a Semantic Attention (SEA) module and a Scale Complementary Mask Branch (SCMB)
SEA module contains a simple fully convolutional semantic segmentation branch with extra supervision to strengthen the activation of interest instances on the feature map.
SCMB extends the original single mask branch to trident mask branches and introduces complementary mask supervision at different scales.
arXiv Detail & Related papers (2021-07-25T08:53:59Z) - SOLO: A Simple Framework for Instance Segmentation [84.00519148562606]
"instance categories" assigns categories to each pixel within an instance according to the instance's location.
"SOLO" is a simple, direct, and fast framework for instance segmentation with strong performance.
Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy.
arXiv Detail & Related papers (2021-06-30T09:56:54Z) - Mask Encoding for Single Shot Instance Segmentation [97.99956029224622]
We propose a simple singleshot instance segmentation framework, termed mask encoding based instance segmentation (MEInst)
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector.
We show that the much simpler and flexible one-stage instance segmentation method, can also achieve competitive performance.
arXiv Detail & Related papers (2020-03-26T02:51:17Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.