Related papers: SCATTER: Selective Context Attentional Scene Text Recognizer

SCATTER: Selective Context Attentional Scene Text Recognizer

URL: http://arxiv.org/abs/2003.11288v1
Date: Wed, 25 Mar 2020 09:20:28 GMT
Title: SCATTER: Selective Context Attentional Scene Text Recognizer
Authors: Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor and R. Manmatha
Abstract summary: Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
Score: 16.311256552979835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Decoding is done using a two-step 1D attention mechanism. The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer. The second attention step, similar to previous papers, treats the features as a sequence and attends to the intra-sequence relationships. Experiments show that the proposed approach surpasses SOTA performance on irregular text recognition benchmarks by 3.7\% on average.

Related papers

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling [44.70973195966149]
Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling. We introduce a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels. Our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText)
arXiv Detail & Related papers (2024-09-20T11:52:57Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. We introduce a novel method named Decoder Pre-training with only text for STR (DPTR) DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z)
Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition. We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR) SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z)
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting [8.397246652127793]
We propose a new pre-training method called OCR-Text Destylization Modeling (ODM) ODM transfers diverse styles of text found in images to a uniform style based on the text prompt. Our method significantly improves performance and outperforms current pre-training methods in scene text detection and spotting tasks.
arXiv Detail & Related papers (2024-03-01T06:13:53Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter. Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism. The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z)
Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA) IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference. We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z)
Primitive Representation Learning for Scene Text Recognition [7.818765015637802]
We propose a primitive representation learning method that aims to exploit intrinsic representations of scene text images. A Primitive REpresentation learning Network (PREN) is constructed to use the visual text representations for parallel decoding. We also propose a framework called PREN2D to alleviate the misalignment problem in attention-based methods.
arXiv Detail & Related papers (2021-05-10T11:54:49Z)
Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron. It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
TextScanner: Reading Characters in Order for Robust Scene Text Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition. It generates pixel-wise, multi-channel segmentation maps for character class, position and order. It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.