SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition
- URL: http://arxiv.org/abs/2203.10209v1
- Date: Sat, 19 Mar 2022 01:14:42 GMT
- Title: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition
- Authors: Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin,
Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin
- Abstract summary: We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
- Score: 73.61592015908353
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: End-to-end scene text spotting has attracted great attention in recent years
due to the success of excavating the intrinsic synergy of the scene text
detection and recognition. However, recent state-of-the-art methods usually
incorporate detection and recognition simply by sharing the backbone, which
does not directly take advantage of the feature interaction between the two
tasks. In this paper, we propose a new end-to-end scene text spotting framework
termed SwinTextSpotter. Using a transformer encoder with dynamic head as the
detector, we unify the two tasks with a novel Recognition Conversion mechanism
to explicitly guide text localization through recognition loss. The
straightforward design results in a concise framework that requires neither
additional rectification module nor character-level annotation for the
arbitrarily-shaped text. Qualitative and quantitative experiments on
multi-oriented datasets RoIC13 and ICDAR 2015, arbitrarily-shaped datasets
Total-Text and CTW1500, and multi-lingual datasets ReCTS (Chinese) and VinText
(Vietnamese) demonstrate SwinTextSpotter significantly outperforms existing
methods. Code is available at https://github.com/mxin262/SwinTextSpotter.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
in Transformer [88.61312640540902]
We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter)
Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2023-08-20T03:22:23Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.