All you need is a second look: Towards Tighter Arbitrary shape text
detection
- URL: http://arxiv.org/abs/2004.12436v1
- Date: Sun, 26 Apr 2020 17:03:41 GMT
- Title: All you need is a second look: Towards Tighter Arbitrary shape text
detection
- Authors: Meng Cao, Yuexian Zou
- Abstract summary: Long curve text instances tend to be fragmented because of the limited receptive field size of CNN.
Simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts.
textitNASK reconstructs text instances with a more tighter representation using the predicted geometrical attributes.
- Score: 80.85188469964346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning-based scene text detection methods have progressed
substantially over the past years. However, there remain several problems to be
solved. Generally, long curve text instances tend to be fragmented because of
the limited receptive field size of CNN. Besides, simple representations using
rectangle or quadrangle bounding boxes fall short when dealing with more
challenging arbitrary-shaped texts. In addition, the scale of text instances
varies greatly which leads to the difficulty of accurate prediction through a
single segmentation network. To address these problems, we innovatively propose
a two-stage segmentation based arbitrary text detector named \textit{NASK}
(\textbf{N}eed \textbf{A} \textbf{S}econd loo\textbf{K}). Specifically,
\textit{NASK} consists of a Text Instance Segmentation network namely
\textit{TIS} (\(1^{st}\) stage), a Text RoI Pooling module and a Fiducial pOint
eXpression module termed as \textit{FOX} (\(2^{nd}\) stage). Firstly,
\textit{TIS} conducts instance segmentation to obtain rectangle text proposals
with a proposed Group Spatial and Channel Attention module (\textit{GSCA}) to
augment the feature expression. Then, Text RoI Pooling transforms these
rectangles to the fixed size. Finally, \textit{FOX} is introduced to
reconstruct text instances with a more tighter representation using the
predicted geometrical attributes including text center line, text line
orientation, character scale and character orientation. Experimental results on
two public benchmarks including \textit{Total-Text} and \textit{SCUT-CTW1500}
have demonstrated that the proposed \textit{NASK} achieves state-of-the-art
results.
Related papers
- Contextual Text Block Detection towards Scene Text Understanding [85.40898487745272]
This paper presents contextual text detection, a new setup that detects contextual text blocks (CTBs) for better understanding of texts in scenes.
We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.
To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence.
arXiv Detail & Related papers (2022-07-26T14:59:25Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection [19.69057252363207]
We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
arXiv Detail & Related papers (2021-07-13T09:34:18Z) - All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection [39.17648241471479]
In this paper, we propose a two-stage segmentation-based detector, termed as NASK (Need A Second looK), for arbitrary-shaped text detection.
arXiv Detail & Related papers (2021-06-24T01:44:10Z) - BOTD: Bold Outline Text Detector [85.33700624095181]
We propose a new one-stage text detector, termed as Bold Outline Text Detector (BOTD)
BOTD is able to process the arbitrary-shaped text with low model complexity.
Experimental results on three real-world benchmarks show the state-of-the-art performance of BOTD.
arXiv Detail & Related papers (2020-11-30T11:54:14Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z) - Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
Spotting [71.6244869235243]
Most arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals.
Our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise.
arXiv Detail & Related papers (2020-07-18T17:25:50Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.