ARTS: Eliminating Inconsistency between Text Detection and Recognition
with Auto-Rectification Text Spotter
- URL: http://arxiv.org/abs/2110.10405v1
- Date: Wed, 20 Oct 2021 06:53:44 GMT
- Title: ARTS: Eliminating Inconsistency between Text Detection and Recognition
with Auto-Rectification Text Spotter
- Authors: Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu
- Abstract summary: We present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS)
Our method achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS.
- Score: 37.86206423441885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches for end-to-end text spotting have achieved promising
results. However, most of the current spotters were plagued by the
inconsistency problem between text detection and recognition. In this work, we
introduce and prove the existence of the inconsistency problem and analyze it
from two aspects: (1) inconsistency of text recognition features between
training and testing, and (2) inconsistency of optimization targets between
text detection and recognition. To solve the aforementioned issues, we propose
a differentiable Auto-Rectification Module (ARM) together with a new training
strategy to enable propagating recognition loss back into detection branch, so
that our detection branch can be jointly optimized by detection and recognition
targets, which largely alleviates the inconsistency problem between text
detection and recognition. Based on these designs, we present a simple yet
robust end-to-end text spotting framework, termed Auto-Rectification Text
Spotter (ARTS), to detect and recognize arbitrarily-shaped text in natural
scenes. Extensive experiments demonstrate the superiority of our method. In
particular, our ARTS-S achieves 77.1% end-to-end text spotting F-measure on
Total-Text at a competitive speed of 10.5 FPS, which significantly outperforms
previous methods in both accuracy and inference speed.
Related papers
- SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
in Transformer [88.61312640540902]
We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter)
Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2023-08-20T03:22:23Z) - Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning [41.56134008044702]
Box is a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models.
Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training.
arXiv Detail & Related papers (2022-07-25T06:58:45Z) - Decoupling Recognition from Detection: Single Shot Self-Reliant Scene
Text Spotter [34.09162878714425]
We propose the single shot Self-Reliant Scene Text Spotter (SRSTS)
We conduct text detection and recognition in parallel and bridge them by the shared positive anchor point.
Our method is able to recognize the text instances correctly even though the precise text boundaries are challenging to detect.
arXiv Detail & Related papers (2022-07-15T01:59:14Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting [11.705454066278898]
We propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework.
The proposed method reduces the tight dependency between detection and recognition modules.
It achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks.
arXiv Detail & Related papers (2022-03-10T02:41:05Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.