Mask DINO: Towards A Unified Transformer-based Framework for Object
Detection and Segmentation
- URL: http://arxiv.org/abs/2206.02777v1
- Date: Mon, 6 Jun 2022 17:57:25 GMT
- Title: Mask DINO: Towards A Unified Transformer-based Framework for Object
Detection and Segmentation
- Authors: Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni,
and Heung-Yeung Shum
- Abstract summary: Mask DINO is a unified object detection and segmentation framework.
Mask DINO is simple, efficient, scalable, and benefits from joint large-scale detection and segmentation datasets.
- Score: 15.826822450977271
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present Mask DINO, a unified object detection and
segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising
Anchor Boxes) by adding a mask prediction branch which supports all image
segmentation tasks (instance, panoptic, and semantic). It makes use of the
query embeddings from DINO to dot-product a high-resolution pixel embedding map
to predict a set of binary masks. Some key components in DINO are extended for
segmentation through a shared architecture and training process. Mask DINO is
simple, efficient, scalable, and benefits from joint large-scale detection and
segmentation datasets. Our experiments show that Mask DINO significantly
outperforms all existing specialized segmentation methods, both on a ResNet-50
backbone and a pre-trained model with SwinL backbone. Notably, Mask DINO
establishes the best results to date on instance segmentation (54.5 AP on
COCO), panoptic segmentation (59.4 PQ on COCO), and semantic segmentation (60.8
mIoU on ADE20K). Code will be avaliable at
\url{https://github.com/IDEACVR/MaskDINO}.
Related papers
- MaskUno: Switch-Split Block For Enhancing Instance Segmentation [0.0]
We propose replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors.
An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes.
arXiv Detail & Related papers (2024-07-31T10:12:14Z) - MaskRange: A Mask-classification Model for Range-view based LiDAR
Segmentation [34.04740351544143]
We propose a unified mask-classification model, MaskRange, for the range-view based LiDAR semantic and panoptic segmentation.
Our MaskRange achieves state-of-the-art performance with $66.10$ mIoU on semantic segmentation and promising results with $53.10$ PQ on panoptic segmentation with high efficiency.
arXiv Detail & Related papers (2022-06-24T04:39:49Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - Masked-attention Mask Transformer for Universal Image Segmentation [180.73009259614494]
We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic)
Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions.
In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets.
arXiv Detail & Related papers (2021-12-02T18:59:58Z) - Mask is All You Need: Rethinking Mask R-CNN for Dense and
Arbitrary-Shaped Scene Text Detection [11.390163890611246]
Mask R-CNN is widely adopted as a strong baseline for arbitrary-shaped scene text detection and spotting.
There may exist multiple instances in one proposal, which makes it difficult for the mask head to distinguish different instances and degrades the performance.
We propose instance-aware mask learning in which the mask head learns to predict the shape of the whole instance rather than classify each pixel to text or non-text.
arXiv Detail & Related papers (2021-09-08T04:32:29Z) - RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained
Features [53.71163467683838]
RefineMask is a new method for high-quality instance segmentation of objects and scenes.
It incorporates fine-grained features during the instance-wise segmenting process in a multi-stage manner.
It succeeds in segmenting hard cases such as bent parts of objects that are over-smoothed by most previous methods.
arXiv Detail & Related papers (2021-04-17T15:09:20Z) - DCT-Mask: Discrete Cosine Transform Mask Representation for Instance
Segmentation [50.70679435176346]
We propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector.
Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods.
arXiv Detail & Related papers (2020-11-19T15:00:21Z) - Mask Encoding for Single Shot Instance Segmentation [97.99956029224622]
We propose a simple singleshot instance segmentation framework, termed mask encoding based instance segmentation (MEInst)
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector.
We show that the much simpler and flexible one-stage instance segmentation method, can also achieve competitive performance.
arXiv Detail & Related papers (2020-03-26T02:51:17Z) - BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity.
Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.