SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
- URL: http://arxiv.org/abs/2401.07641v1
- Date: Mon, 15 Jan 2024 12:33:00 GMT
- Title: SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
- Authors: Mingxin Huang and Dezhi Peng and Hongliang Li and Zhenghao Peng and
Chongyu Liu and Dahua Lin and Yuliang Liu and Xiang Bai and Lianwen Jin
- Abstract summary: We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
- Score: 126.01629300244001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end scene text spotting, which aims to read the text in natural
images, has garnered significant attention in recent years. However, recent
state-of-the-art methods usually incorporate detection and recognition simply
by sharing the backbone, which does not directly take advantage of the feature
interaction between the two tasks. In this paper, we propose a new end-to-end
scene text spotting framework termed SwinTextSpotter v2, which seeks to find a
better synergy between text detection and recognition. Specifically, we enhance
the relationship between two tasks using novel Recognition Conversion and
Recognition Alignment modules. Recognition Conversion explicitly guides text
localization through recognition loss, while Recognition Alignment dynamically
extracts text features for recognition through the detection predictions. This
simple yet effective design results in a concise framework that requires
neither an additional rectification module nor character-level annotations for
the arbitrarily-shaped text. Furthermore, the parameters of the detector are
greatly reduced without performance degradation by introducing a Box Selection
Schedule. Qualitative and quantitative experiments demonstrate that
SwinTextSpotter v2 achieved state-of-the-art performance on various
multilingual (English, Chinese, and Vietnamese) benchmarks. The code will be
available at
\href{https://github.com/mxin262/SwinTextSpotterv2}{SwinTextSpotter v2}.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
in Transformer [88.61312640540902]
We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter)
Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2023-08-20T03:22:23Z) - Decoupling Recognition from Detection: Single Shot Self-Reliant Scene
Text Spotter [34.09162878714425]
We propose the single shot Self-Reliant Scene Text Spotter (SRSTS)
We conduct text detection and recognition in parallel and bridge them by the shared positive anchor point.
Our method is able to recognize the text instances correctly even though the precise text boundaries are challenging to detect.
arXiv Detail & Related papers (2022-07-15T01:59:14Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - AE TextSpotter: Learning Visual and Linguistic Representation for
Ambiguous Text Spotting [98.08853679310603]
This work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter)
AE TextSpotter learns both visual and linguistic features to significantly reduce ambiguity in text detection.
To our knowledge, it is the first time to improve text detection by using a language model.
arXiv Detail & Related papers (2020-08-03T08:40:01Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.