Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting
- URL: http://arxiv.org/abs/2002.06820v2
- Date: Mon, 25 Oct 2021 09:34:22 GMT
- Title: Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting
- Authors: Liang Qiao, Sanli Tang, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu
and Fei Wu
- Abstract summary: We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
- Score: 49.768327669098674
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many approaches have recently been proposed to detect irregular scene text
and achieved promising results. However, their localization results may not
well satisfy the following text recognition part mainly because of two reasons:
1) recognizing arbitrary shaped text is still a challenging task, and 2)
prevalent non-trainable pipeline strategies between text detection and text
recognition will lead to suboptimal performances. To handle this
incompatibility problem, in this paper we propose an end-to-end trainable text
spotting approach named Text Perceptron. Concretely, Text Perceptron first
employs an efficient segmentation-based text detector that learns the latent
text reading order and boundary information. Then a novel Shape Transform
Module (abbr. STM) is designed to transform the detected feature regions into
regular morphologies without extra parameters. It unites text detection and the
following recognition part into a whole framework, and helps the whole network
achieve global optimization. Experiments show that our method achieves
competitive performance on two standard text benchmarks, i.e., ICDAR 2013 and
ICDAR 2015, and also obviously outperforms existing methods on irregular text
benchmarks SCUT-CTW1500 and Total-Text.
Related papers
- SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
in Transformer [88.61312640540902]
We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter)
Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2023-08-20T03:22:23Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning [41.56134008044702]
Box is a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models.
Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training.
arXiv Detail & Related papers (2022-07-25T06:58:45Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - Implicit Feature Alignment: Learn to Convert Text Recognizer to Text
Spotter [38.4211220941874]
We propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA)
IFA can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks.
arXiv Detail & Related papers (2021-06-10T17:06:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.