DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting
- URL: http://arxiv.org/abs/2203.05122v1
- Date: Thu, 10 Mar 2022 02:41:05 GMT
- Title: DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting
- Authors: Seonghyeon Kim, Seung Shin, Yoonsik Kim, Han-Cheol Cho, Taeho Kil,
Jaeheung Surh, Seunghyun Park, Bado Lee, Youngmin Baek
- Abstract summary: We propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework.
The proposed method reduces the tight dependency between detection and recognition modules.
It achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks.
- Score: 11.705454066278898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent end-to-end scene text spotters have achieved great improvement in
recognizing arbitrary-shaped text instances. Common approaches for text
spotting use region of interest pooling or segmentation masks to restrict
features to single text instances. However, this makes it hard for the
recognizer to decode correct sequences when the detection is not accurate i.e.
one or more characters are cropped out. Considering that it is hard to
accurately decide word boundaries with only the detector, we propose a novel
Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method
reduces the tight dependency between detection and recognition modules by
bridging them with a single reference point for each text instance, instead of
using detected regions. The proposed method allows the decoder to recognize the
texts that are indicated by the reference point, with features from the whole
image. Since only a single point is required to recognize the text, the
proposed method enables text spotting without an arbitrarily-shaped detector or
bounding polygon annotations. Experimental results present that the proposed
method achieves competitive results on regular and arbitrarily-shaped text
spotting benchmarks. Further analysis shows that DEER is robust to the
detection errors. The code and dataset will be publicly available.
Related papers
- TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images.
Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection.
Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - Decoupling Recognition from Detection: Single Shot Self-Reliant Scene
Text Spotter [34.09162878714425]
We propose the single shot Self-Reliant Scene Text Spotter (SRSTS)
We conduct text detection and recognition in parallel and bridge them by the shared positive anchor point.
Our method is able to recognize the text instances correctly even though the precise text boundaries are challenging to detect.
arXiv Detail & Related papers (2022-07-15T01:59:14Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - ARTS: Eliminating Inconsistency between Text Detection and Recognition
with Auto-Rectification Text Spotter [37.86206423441885]
We present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS)
Our method achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS.
arXiv Detail & Related papers (2021-10-20T06:53:44Z) - Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images [8.180563824325086]
We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
arXiv Detail & Related papers (2021-09-08T06:25:37Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - AE TextSpotter: Learning Visual and Linguistic Representation for
Ambiguous Text Spotting [98.08853679310603]
This work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter)
AE TextSpotter learns both visual and linguistic features to significantly reduce ambiguity in text detection.
To our knowledge, it is the first time to improve text detection by using a language model.
arXiv Detail & Related papers (2020-08-03T08:40:01Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.