LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
- URL: http://arxiv.org/abs/2308.12774v1
- Date: Thu, 24 Aug 2023 13:26:18 GMT
- Title: LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
- Authors: Changxu Cheng, Peng Wang, Cheng Da, Qi Zheng, Cong Yao
- Abstract summary: We propose a method called Length-Insensitive Scene TExt Recognizer (LISTER)
A Neighbor Decoder is proposed to obtain accurate character attention maps with the assistance of a novel neighbor matrix.
A Feature Enhancement Module is devised to model the long-range dependency with low cost.
- Score: 27.280917081410955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The diversity in length constitutes a significant characteristic of text. Due
to the long-tail distribution of text lengths, most existing methods for scene
text recognition (STR) only work well on short or seen-length text, lacking the
capability of recognizing longer text or performing length extrapolation. This
is a crucial issue, since the lengths of the text to be recognized are usually
not given in advance in real-world applications, but it has not been adequately
investigated in previous works. Therefore, we propose in this paper a method
called Length-Insensitive Scene TExt Recognizer (LISTER), which remedies the
limitation regarding the robustness to various text lengths. Specifically, a
Neighbor Decoder is proposed to obtain accurate character attention maps with
the assistance of a novel neighbor matrix regardless of the text lengths.
Besides, a Feature Enhancement Module is devised to model the long-range
dependency with low computation cost, which is able to perform iterations with
the neighbor decoder to enhance the feature map progressively. To the best of
our knowledge, we are the first to achieve effective length-insensitive scene
text recognition. Extensive experiments demonstrate that the proposed LISTER
algorithm exhibits obvious superiority on long text recognition and the ability
for length extrapolation, while comparing favourably with the previous
state-of-the-art methods on standard benchmarks for STR (mainly short text).
Related papers
- Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval [13.315951821189538]
Scene text retrieval aims to find all images containing the query text from an image gallery.
Current efforts tend to adopt an Optical Character Recognition (OCR) pipeline, which requires complicated text detection and/or recognition processes.
We propose to explore the intrinsic potential of Contrastive Language-Image Pre-training (CLIP) for OCR-free scene text retrieval.
arXiv Detail & Related papers (2024-08-01T10:25:14Z) - Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - Word length-aware text spotting: Enhancing detection and recognition in
dense text image [33.44340604133642]
We present WordLenSpotter, a novel word length-aware spotter for scene text image detection and recognition.
We improve the spotting capabilities for long and short words, particularly in the tail data of dense text images.
arXiv Detail & Related papers (2023-12-25T10:46:20Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z) - Text Guide: Improving the quality of long text classification by a text
selection method based on feature importance [0.0]
We propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit.
We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification.
arXiv Detail & Related papers (2021-04-15T04:10:08Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.