All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection
- URL: http://arxiv.org/abs/2106.12720v1
- Date: Thu, 24 Jun 2021 01:44:10 GMT
- Title: All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection
- Authors: Meng Cao, Can Zhang, Dongming Yang, Yuexian Zou
- Abstract summary: In this paper, we propose a two-stage segmentation-based detector, termed as NASK (Need A Second looK), for arbitrary-shaped text detection.
- Score: 39.17648241471479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Arbitrary-shaped text detection is a challenging task since curved texts in
the wild are of the complex geometric layouts. Existing mainstream methods
follow the instance segmentation pipeline to obtain the text regions. However,
arbitraryshaped texts are difficult to be depicted through one single
segmentation network because of the varying scales. In this paper, we propose a
two-stage segmentation-based detector, termed as NASK (Need A Second looK), for
arbitrary-shaped text detection. Compared to the traditional single-stage
segmentation network, our NASK conducts the detection in a coarse-to-fine
manner with the first stage segmentation spotting the rectangle text proposals
and the second one retrieving compact representations. Specifically, NASK is
composed of a Text Instance Segmentation (TIS) network (1st stage), a
Geometry-aware Text RoI Alignment (GeoAlign) module, and a Fiducial pOint
eXpression (FOX) module (2nd stage). Firstly, TIS extracts the augmented
features with a novel Group Spatial and Channel Attention (GSCA) module and
conducts instance segmentation to obtain rectangle proposals. Then, GeoAlign
converts these rectangles into the fixed size and encodes RoI-wise feature
representation. Finally, FOX disintegrates the text instance into serval
pivotal geometrical attributes to refine the detection results. Extensive
experimental results on three public benchmarks including Total-Text,
SCUTCTW1500, and ICDAR 2015 verify that our NASK outperforms recent
state-of-the-art methods.
Related papers
- TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
Spotting [108.93803186429017]
End-to-end text-spotting aims to integrate detection and recognition in a unified framework.
Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2)
Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation.
Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
arXiv Detail & Related papers (2021-05-08T07:46:55Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z) - All you need is a second look: Towards Tighter Arbitrary shape text
detection [80.85188469964346]
Long curve text instances tend to be fragmented because of the limited receptive field size of CNN.
Simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts.
textitNASK reconstructs text instances with a more tighter representation using the predicted geometrical attributes.
arXiv Detail & Related papers (2020-04-26T17:03:41Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z) - PuzzleNet: Scene Text Detection by Segment Context Graph Learning [9.701699882807251]
We propose a novel decomposition-based method, termed Puzzle Networks (PuzzleNet), to address the challenging scene text detection task.
By building segments as context graphs, MSGCN effectively employs segment context to predict combinations of segments.
Our method can achieve better or comparable performance than current state-of-the-arts, which is beneficial from the exploitation of segment context graph.
arXiv Detail & Related papers (2020-02-26T09:21:05Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.