Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter
- URL: http://arxiv.org/abs/2106.05920v1
- Date: Thu, 10 Jun 2021 17:06:28 GMT
- Title: Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter
- Authors: Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Dezhi Peng, Zhe Li, Mengchao
He, Yongpan Wang, Canjie Luo
- Abstract summary: We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
- Score: 38.4211220941874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text recognition is a popular research subject with many associated
challenges. Despite the considerable progress made in recent years, the text
recognition task itself is still constrained to solve the problem of reading
cropped line text images and serves as a subtask of optical character
recognition (OCR) systems. As a result, the final text recognition result is
limited by the performance of the text detector. In this paper, we propose a
simple, elegant and effective paradigm called Implicit Feature Alignment (IFA),
which can be easily integrated into current text recognizers, resulting in a
novel inference mechanism called IFAinference. This enables an ordinary text
recognizer to process multi-line text such that text detection can be
completely freed. Specifically, we integrate IFA into the two most prevailing
text recognition streams (attention-based and CTC-based) and propose
attention-guided dense prediction (ADP) and Extended CTC (ExCTC). Furthermore,
the Wasserstein-based Hollow Aggregation Cross-Entropy (WH-ACE) is proposed to
suppress negative predictions to assist in training ADP and ExCTC. We
experimentally demonstrate that IFA achieves state-of-the-art performance on
end-to-end document recognition tasks while maintaining the fastest speed, and
ADP and ExCTC complement each other on the perspective of different application
scenarios. Code will be available at
https://github.com/WangTianwei/Implicit-feature-alignment.
Related papers
- Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval [13.315951821189538]
Scene text retrieval aims to find all images containing the query text from an image gallery.
Current efforts tend to adopt an Optical Character Recognition (OCR) pipeline, which requires complicated text detection and/or recognition processes.
We propose to explore the intrinsic potential of Contrastive Language-Image Pre-training (CLIP) for OCR-free scene text retrieval.
arXiv Detail & Related papers (2024-08-01T10:25:14Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning [41.56134008044702]
Box is a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models.
Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training.
arXiv Detail & Related papers (2022-07-25T06:58:45Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - SCATTER: Selective Context Attentional Scene Text Recognizer [16.311256552979835]
Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
arXiv Detail & Related papers (2020-03-25T09:20:28Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.