SCATTER: Selective Context Attentional Scene Text Recognizer
- URL: http://arxiv.org/abs/2003.11288v1
- Date: Wed, 25 Mar 2020 09:20:28 GMT
- Title: SCATTER: Selective Context Attentional Scene Text Recognizer
- Authors: Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor and
R. Manmatha
- Abstract summary: Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
- Score: 16.311256552979835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Text Recognition (STR), the task of recognizing text against complex
image backgrounds, is an active area of research. Current state-of-the-art
(SOTA) methods still struggle to recognize text written in arbitrary shapes. In
this paper, we introduce a novel architecture for STR, named Selective Context
ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block
architecture with intermediate supervision during training, that paves the way
to successfully train a deep BiLSTM encoder, thus improving the encoding of
contextual dependencies. Decoding is done using a two-step 1D attention
mechanism. The first attention step re-weights visual features from a CNN
backbone together with contextual features computed by a BiLSTM layer. The
second attention step, similar to previous papers, treats the features as a
sequence and attends to the intra-sequence relationships. Experiments show that
the proposed approach surpasses SOTA performance on irregular text recognition
benchmarks by 3.7\% on average.
Related papers
- Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling [44.70973195966149]
Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling.
We introduce a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels.
Our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText)
arXiv Detail & Related papers (2024-09-20T11:52:57Z) - Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets.
We introduce a novel method named Decoder Pre-training with only text for STR (DPTR)
DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z) - Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting [8.397246652127793]
We propose a new pre-training method called OCR-Text Destylization Modeling (ODM)
ODM transfers diverse styles of text found in images to a uniform style based on the text prompt.
Our method significantly improves performance and outperforms current pre-training methods in scene text detection and spotting tasks.
arXiv Detail & Related papers (2024-03-01T06:13:53Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z) - Primitive Representation Learning for Scene Text Recognition [7.818765015637802]
We propose a primitive representation learning method that aims to exploit intrinsic representations of scene text images.
A Primitive REpresentation learning Network (PREN) is constructed to use the visual text representations for parallel decoding.
We also propose a framework called PREN2D to alleviate the misalignment problem in attention-based methods.
arXiv Detail & Related papers (2021-05-10T11:54:49Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.