Mask is All You Need: Rethinking Mask R-CNN for Dense and
Arbitrary-Shaped Scene Text Detection
- URL: http://arxiv.org/abs/2109.03426v1
- Date: Wed, 8 Sep 2021 04:32:29 GMT
- Title: Mask is All You Need: Rethinking Mask R-CNN for Dense and
Arbitrary-Shaped Scene Text Detection
- Authors: Xugong Qin, Yu Zhou, Youhui Guo, Dayan Wu, Zhihong Tian, Ning Jiang,
Hongbin Wang, Weiping Wang
- Abstract summary: Mask R-CNN is widely adopted as a strong baseline for arbitrary-shaped scene text detection and spotting.
There may exist multiple instances in one proposal, which makes it difficult for the mask head to distinguish different instances and degrades the performance.
We propose instance-aware mask learning in which the mask head learns to predict the shape of the whole instance rather than classify each pixel to text or non-text.
- Score: 11.390163890611246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the large success in object detection and instance segmentation, Mask
R-CNN attracts great attention and is widely adopted as a strong baseline for
arbitrary-shaped scene text detection and spotting. However, two issues remain
to be settled. The first is dense text case, which is easy to be neglected but
quite practical. There may exist multiple instances in one proposal, which
makes it difficult for the mask head to distinguish different instances and
degrades the performance. In this work, we argue that the performance
degradation results from the learning confusion issue in the mask head. We
propose to use an MLP decoder instead of the "deconv-conv" decoder in the mask
head, which alleviates the issue and promotes robustness significantly. And we
propose instance-aware mask learning in which the mask head learns to predict
the shape of the whole instance rather than classify each pixel to text or
non-text. With instance-aware mask learning, the mask branch can learn
separated and compact masks. The second is that due to large variations in
scale and aspect ratio, RPN needs complicated anchor settings, making it hard
to maintain and transfer across different datasets. To settle this issue, we
propose an adaptive label assignment in which all instances especially those
with extreme aspect ratios are guaranteed to be associated with enough anchors.
Equipped with these components, the proposed method named MAYOR achieves
state-of-the-art performance on five benchmarks including DAST1500, MSRA-TD500,
ICDAR2015, CTW1500, and Total-Text.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - DynaMask: Dynamic Mask Selection for Instance Segmentation [21.50329070835023]
We develop a Mask Switch Module (MSM) with negligible computational cost to select the most suitable mask resolution for each instance.
The proposed method, namely DynaMask, brings consistent and noticeable performance improvements over other state-of-the-arts at a moderate computation overhead.
arXiv Detail & Related papers (2023-03-14T13:01:25Z) - MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers.
We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z) - Mask Transfiner for High-Quality Instance Segmentation [95.74244714914052]
We present Mask Transfiner for high-quality and efficient instance segmentation.
Our approach only processes detected error-prone tree nodes and self-corrects their errors in parallel.
Our code and trained models will be available at http://vis.xyz/pub/transfiner.
arXiv Detail & Related papers (2021-11-26T18:58:22Z) - BoxInst: High-Performance Instance Segmentation with Box Annotations [102.10713189544947]
We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training.
Our core idea is to exploit the loss of learning masks in instance segmentation, with no modification to the segmentation network itself.
arXiv Detail & Related papers (2020-12-03T22:27:55Z) - DCT-Mask: Discrete Cosine Transform Mask Representation for Instance
Segmentation [50.70679435176346]
We propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector.
Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods.
arXiv Detail & Related papers (2020-11-19T15:00:21Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z) - BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity.
Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.