Adaptive Shrink-Mask for Text Detection
- URL: http://arxiv.org/abs/2111.09560v1
- Date: Thu, 18 Nov 2021 07:38:57 GMT
- Title: Adaptive Shrink-Mask for Text Detection
- Authors: Chuang Yang, Mulin Chen, Yuan Yuan, Qi Wang, Xuelong Li
- Abstract summary: Existing real-time text detectors reconstruct text contours by shrink-masks directly.
The dependence on predicted shrink-masks leads to unstable detection results.
Super-pixel Window (SPW) is designed to supervise the network.
- Score: 91.34459257409104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing real-time text detectors reconstruct text contours by shrink-masks
directly, which simplifies the framework and can make the model run fast.
However, the strong dependence on predicted shrink-masks leads to unstable
detection results. Moreover, the discrimination of shrink-masks is a pixelwise
prediction task. Supervising the network by shrink-masks only will lose much
semantic context, which leads to the false detection of shrink-masks. To
address these problems, we construct an efficient text detection network,
Adaptive Shrink-Mask for Text Detection (ASMTD), which improves the accuracy
during training and reduces the complexity of the inference process. At first,
the Adaptive Shrink-Mask (ASM) is proposed to represent texts by shrink-masks
and independent adaptive offsets. It weakens the coupling of texts to
shrink-masks, which improves the robustness of detection results. Then, the
Super-pixel Window (SPW) is designed to supervise the network. It utilizes the
surroundings of each pixel to improve the reliability of predicted shrink-masks
and does not appear during testing. In the end, a lightweight feature merging
branch is constructed to reduce the computational cost. As demonstrated in the
experiments, our method is superior to existing state-of-the-art (SOTA) methods
in both detection accuracy and speed on multiple benchmarks.
Related papers
- Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes [31.180352896153682]
We propose an efficient multi-scene text detector that contains an effective text representation similar mask (SM) and a feature correction module (FCM)
To validate the scene of the SM-Net, we conduct experiments on traffic, industrial, and natural scene datasets.
arXiv Detail & Related papers (2024-11-05T04:08:59Z) - Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision [87.15580604023555]
Unpair-Seg is a novel weakly-supervised open-vocabulary segmentation framework.
It learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected.
It achieves 14.6% and 19.5% mIoU on the ADE-847 and PASCAL Context-459 datasets.
arXiv Detail & Related papers (2024-02-14T06:01:44Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - Hard Nominal Example-aware Template Mutual Matching for Industrial
Anomaly Detection [74.9262846410559]
textbfHard Nominal textbfExample-aware textbfTemplate textbfMutual textbfMatching (HETMM)
textitHETMM aims to construct a robust prototype-based decision boundary, which can precisely distinguish between hard-nominal examples and anomalies.
arXiv Detail & Related papers (2023-03-28T17:54:56Z) - Zoom Text Detector [26.761735112547953]
Text detectors adopt shrink-mask based representation strategies.
Unfortunately, three disadvantages cause unreliable shrink-masks.
We propose a Zoom Text Detector inspired by the zoom process of the camera.
arXiv Detail & Related papers (2022-09-07T09:19:21Z) - Real-Time Mask Detection Based on SSD-MobileNetV2 [2.538209532048867]
An excellent automatic real-time mask detection system can reduce a lot of work pressure for relevant staff.
Existing mask detection approaches are resource-intensive and do not achieve a good balance between speed and accuracy.
In this paper, we propose a new architecture for mask detection.
arXiv Detail & Related papers (2022-08-29T01:59:22Z) - TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask [19.269070203448187]
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2022-06-27T15:42:25Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - OLED: One-Class Learned Encoder-Decoder Network with Adversarial Context
Masking for Novelty Detection [1.933681537640272]
novelty detection is the task of recognizing samples that do not belong to the distribution of the target class.
Deep autoencoders have been widely used as a base of many unsupervised novelty detection methods.
We have designed a framework consisting of two competing networks, a Mask Module and a Reconstructor.
arXiv Detail & Related papers (2021-03-27T17:59:40Z) - Suppressing Uncertainties for Large-Scale Facial Expression Recognition [81.51495681011404]
This paper proposes a simple yet efficient Self-Cure Network (SCN) which suppresses the uncertainties efficiently and prevents deep networks from over-fitting uncertain facial images.
Results on public benchmarks demonstrate that our SCN outperforms current state-of-the-art methods with textbf88.14% on RAF-DB, textbf60.23% on AffectNet, and textbf89.35% on FERPlus.
arXiv Detail & Related papers (2020-02-24T17:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.