Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
Spotting
- URL: http://arxiv.org/abs/2007.09482v1
- Date: Sat, 18 Jul 2020 17:25:50 GMT
- Title: Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
Spotting
- Authors: Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai
- Abstract summary: Most arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals.
Our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise.
- Score: 71.6244869235243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent end-to-end trainable methods for scene text spotting, integrating
detection and recognition, showed much progress. However, most of the current
arbitrary-shape scene text spotters use region proposal networks (RPN) to
produce proposals. RPN relies heavily on manually designed anchors and its
proposals are represented with axis-aligned rectangles. The former presents
difficulties in handling text instances of extreme aspect ratios or irregular
shapes, and the latter often includes multiple neighboring instances into a
single proposal, in cases of densely oriented text. To tackle these problems,
we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that
adopts a Segmentation Proposal Network (SPN) instead of an RPN. Our SPN is
anchor-free and gives accurate representations of arbitrary-shape proposals. It
is therefore superior to RPN in detecting text instances of extreme aspect
ratios or irregular shapes. Furthermore, the accurate proposals produced by SPN
allow masked RoI features to be used for decoupling neighboring text instances.
As a result, our Mask TextSpotter v3 can handle text instances of extreme
aspect ratios or irregular shapes, and its recognition accuracy won't be
affected by nearby text or background noise. Specifically, we outperform
state-of-the-art methods by 21.9 percent on the Rotated ICDAR 2013 dataset
(rotation robustness), 5.9 percent on the Total-Text dataset (shape
robustness), and achieve state-of-the-art performance on the MSRA-TD500 dataset
(aspect ratio robustness). Code is available at:
https://github.com/MhLiao/MaskTextSpotterV3
Related papers
- Adaptive Segmentation Network for Scene Text Detection [0.0]
We propose to automatically learn the discriminate segmentation threshold, which distinguishes text pixels from background pixels for segmentation-based scene text detectors.
Besides, we design a Global-information Enhanced Feature Pyramid Network (GE-FPN) for capturing text instances with macro size and extreme aspect ratios.
Finally, together with the proposed threshold learning strategy and text detection structure, we design an Adaptive Network (ASNet) for scene text detection.
arXiv Detail & Related papers (2023-07-27T17:37:56Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask [19.269070203448187]
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2022-06-27T15:42:25Z) - CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection [19.69057252363207]
We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
arXiv Detail & Related papers (2021-07-13T09:34:18Z) - PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text [85.7020597476857]
We propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels.
As a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications.
arXiv Detail & Related papers (2021-05-02T07:04:30Z) - PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering
Network [54.03560668182197]
We propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time.
With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations.
Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed.
arXiv Detail & Related papers (2021-04-12T13:27:34Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z) - All you need is a second look: Towards Tighter Arbitrary shape text
detection [80.85188469964346]
Long curve text instances tend to be fragmented because of the limited receptive field size of CNN.
Simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts.
textitNASK reconstructs text instances with a more tighter representation using the predicted geometrical attributes.
arXiv Detail & Related papers (2020-04-26T17:03:41Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.