Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning
- URL: http://arxiv.org/abs/2207.11934v2
- Date: Tue, 26 Jul 2022 07:14:17 GMT
- Title: Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning
- Authors: Jingqun Tang, Wenming Qian, Luchuan Song, Xiena Dong, Lan Li, Xiang
Bai
- Abstract summary: Box is a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models.
Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training.
- Score: 41.56134008044702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text detection and recognition are essential components of a modern OCR
system. Most OCR approaches attempt to obtain accurate bounding boxes of text
at the detection stage, which is used as the input of the text recognition
stage. We observe that when using tight text bounding boxes as input, a text
recognizer frequently fails to achieve optimal performance due to the
inconsistency between bounding boxes and deep representations of text
recognition. In this paper, we propose Box Adjuster, a reinforcement
learning-based method for adjusting the shape of each text bounding box to make
it more compatible with text recognition models. Additionally, when dealing
with cross-domain problems such as synthetic-to-real, the proposed method
significantly reduces mismatches in domain distribution between the source and
target domains. Experiments demonstrate that the performance of end-to-end text
recognition systems can be improved when using the adjusted bounding boxes as
the ground truths for training. Specifically, on several benchmark datasets for
scene text understanding, the proposed method outperforms state-of-the-art text
spotters by an average of 2.0% F-Score on end-to-end text recognition tasks and
4.6% F-Score on domain adaptation tasks.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Decoupling Recognition from Detection: Single Shot Self-Reliant Scene
Text Spotter [34.09162878714425]
We propose the single shot Self-Reliant Scene Text Spotter (SRSTS)
We conduct text detection and recognition in parallel and bridge them by the shared positive anchor point.
Our method is able to recognize the text instances correctly even though the precise text boundaries are challenging to detect.
arXiv Detail & Related papers (2022-07-15T01:59:14Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - ARTS: Eliminating Inconsistency between Text Detection and Recognition
with Auto-Rectification Text Spotter [37.86206423441885]
We present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS)
Our method achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS.
arXiv Detail & Related papers (2021-10-20T06:53:44Z) - Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.