Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images
- URL: http://arxiv.org/abs/2109.03451v1
- Date: Wed, 8 Sep 2021 06:25:37 GMT
- Title: Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images
- Authors: Youhui Guo, Yu Zhou, Xugong Qin, Weiping Wang
- Abstract summary: We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
- Score: 8.180563824325086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text detection has drawn the close attention of researchers. Though
many methods have been proposed for horizontal and oriented texts, previous
methods may not perform well when dealing with arbitrary-shaped texts such as
curved texts. In particular, confusion problem arises in the case of nearby
text instances. In this paper, we propose a simple yet effective method for
accurate arbitrary-shaped nearby scene text detection. Firstly, a One-to-Many
Training Scheme (OMTS) is designed to eliminate confusion and enable the
proposals to learn more appropriate groundtruths in the case of nearby text
instances. Secondly, we propose a Proposal Feature Attention Module (PFAM) to
exploit more effective features for each proposal, which can better adapt to
arbitrary-shaped text instances. Finally, we propose a baseline that is based
on Faster R-CNN and outputs the curve representation directly. Equipped with
PFAM and OMTS, the detector can achieve state-of-the-art or competitive
performance on several challenging benchmarks.
Related papers
- Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images.
Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection.
Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting [11.705454066278898]
We propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework.
The proposed method reduces the tight dependency between detection and recognition modules.
It achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks.
arXiv Detail & Related papers (2022-03-10T02:41:05Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.