TextScanner: Reading Characters in Order for Robust Scene Text
Recognition
- URL: http://arxiv.org/abs/1912.12422v2
- Date: Wed, 1 Jan 2020 10:18:26 GMT
- Title: TextScanner: Reading Characters in Order for Robust Scene Text
Recognition
- Authors: Zhaoyi Wan, Minghang He, Haoran Chen, Xiang Bai and Cong Yao
- Abstract summary: TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
- Score: 60.04267660533966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driven by deep learning and the large volume of data, scene text recognition
has evolved rapidly in recent years. Formerly, RNN-attention based methods have
dominated this field, but suffer from the problem of \textit{attention drift}
in certain situations. Lately, semantic segmentation based algorithms have
proven effective at recognizing text of different forms (horizontal, oriented
and curved). However, these methods may produce spurious characters or miss
genuine characters, as they rely heavily on a thresholding procedure operated
on segmentation maps. To tackle these challenges, we propose in this paper an
alternative approach, called TextScanner, for scene text recognition.
TextScanner bears three characteristics: (1) Basically, it belongs to the
semantic segmentation family, as it generates pixel-wise, multi-channel
segmentation maps for character class, position and order; (2) Meanwhile, akin
to RNN-attention based methods, it also adopts RNN for context modeling; (3)
Moreover, it performs paralleled prediction for character position and class,
and ensures that characters are transcripted in correct order. The experiments
on standard benchmark datasets demonstrate that TextScanner outperforms the
state-of-the-art methods. Moreover, TextScanner shows its superiority in
recognizing more difficult text such Chinese transcripts and aligning with
target characters.
Related papers
- General Detection-based Text Line Recognition [15.761142324480165]
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR)
Our approach builds on a completely different paradigm than state-of-the-art HTR methods, which rely on autoregressive decoding.
We improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets.
arXiv Detail & Related papers (2024-09-25T17:05:55Z) - Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Weakly-Supervised Text Instance Segmentation [44.20745377169349]
We take the first attempt to perform weakly-supervised text instance segmentation by bridging text recognition and text segmentation.
The proposed method significantly outperforms weakly-supervised instance segmentation methods on ICDAR13-FST (18.95$%$ improvement) and TextSeg (17.80$%$ improvement) benchmarks.
arXiv Detail & Related papers (2023-03-20T03:56:47Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Unsupervised learning of text line segmentation by differentiating
coarse patterns [0.0]
We present an unsupervised deep learning method that embeds document image patches to a compact Euclidean space where distances correspond to a coarse text line pattern similarity.
Text line segmentation can be easily implemented using standard techniques with the embedded feature vectors.
We evaluate the method qualitatively and quantitatively on several variants of text line segmentation datasets to demonstrate its effectivity.
arXiv Detail & Related papers (2021-05-19T21:21:30Z) - SCATTER: Selective Context Attentional Scene Text Recognizer [16.311256552979835]
Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
arXiv Detail & Related papers (2020-03-25T09:20:28Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.