MANGO: A Mask Attention Guided One-Stage Scene Text Spotter
- URL: http://arxiv.org/abs/2012.04350v1
- Date: Tue, 8 Dec 2020 10:47:49 GMT
- Title: MANGO: A Mask Attention Guided One-Stage Scene Text Spotter
- Authors: Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu
and Fei Wu
- Abstract summary: We propose a novel Mask AttentioN Guided One-stage text spotting framework named MANGO.
The proposed method achieves competitive and even new state-of-the-art performance on both regular and irregular text spotting benchmarks.
- Score: 41.66707532607276
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently end-to-end scene text spotting has become a popular research topic
due to its advantages of global optimization and high maintainability in real
applications. Most methods attempt to develop various region of interest (RoI)
operations to concatenate the detection part and the sequence recognition part
into a two-stage text spotting framework. However, in such framework, the
recognition part is highly sensitive to the detected results (\emph{e.g.}, the
compactness of text contours). To address this problem, in this paper, we
propose a novel Mask AttentioN Guided One-stage text spotting framework named
MANGO, in which character sequences can be directly recognized without RoI
operation. Concretely, a position-aware mask attention module is developed to
generate attention weights on each text instance and its characters. It allows
different text instances in an image to be allocated on different feature map
channels which are further grouped as a batch of instance features. Finally, a
lightweight sequence decoder is applied to generate the character sequences. It
is worth noting that MANGO inherently adapts to arbitrary-shaped text spotting
and can be trained end-to-end with only coarse position information
(\emph{e.g.}, rectangular bounding box) and text annotations. Experimental
results show that the proposed method achieves competitive and even new
state-of-the-art performance on both regular and irregular text spotting
benchmarks, i.e., ICDAR 2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.
Related papers
- Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt [10.17947324152468]
Region prompt tuning method decomposes region text prompt into individual characters and splits visual feature map into region visual tokens.
This allows a character matches the local features of a token, thereby avoiding the omission of detailed features and fine-grained text.
Our proposed method combines a general score map from the image-text process with a region score map derived from character-token matching.
arXiv Detail & Related papers (2024-09-20T15:24:26Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View
Semantic Consistency [126.88107868670767]
We propose multi-textbfView textbfConsistent learning (ViewCo) for text-supervised semantic segmentation.
We first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.
We also propose cross-view segmentation consistency modeling to address the ambiguity issue of text supervision.
arXiv Detail & Related papers (2023-01-31T01:57:52Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering
Network [54.03560668182197]
We propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time.
With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations.
Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed.
arXiv Detail & Related papers (2021-04-12T13:27:34Z) - SCATTER: Selective Context Attentional Scene Text Recognizer [16.311256552979835]
Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
arXiv Detail & Related papers (2020-03-25T09:20:28Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.