MT: Multi-Perspective Feature Learning Network for Scene Text Detection
- URL: http://arxiv.org/abs/2105.05455v1
- Date: Wed, 12 May 2021 06:41:34 GMT
- Title: MT: Multi-Perspective Feature Learning Network for Scene Text Detection
- Authors: Chuang Yang, Mulin Chen, Yuan Yuan (Senior Member, IEEE), and Qi Wang
(Senior Member, IEEE)
- Abstract summary: A light-weight detection framework is designed to speed up the inference process while keeping high detection accuracy.
A multi-perspective feature module is proposed to learn more discriminative representations to segment the mask accurately.
The effectiveness of MT is evaluated on four real-world scene text datasets.
- Score: 9.282254601960613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text detection, the key technology for understanding scene text, has become
an attractive research topic. For detecting various scene texts, researchers
propose plenty of detectors with different advantages: detection-based models
enjoy fast detection speed, and segmentation-based algorithms are not limited
by text shapes. However, for most intelligent systems, the detector needs to
detect arbitrary-shaped texts with high speed and accuracy simultaneously.
Thus, in this study, we design an efficient pipeline named as MT, which can
detect adhesive arbitrary-shaped texts with only a single binary mask in the
inference stage. This paper presents the contributions on three aspects: (1) a
light-weight detection framework is designed to speed up the inference process
while keeping high detection accuracy; (2) a multi-perspective feature module
is proposed to learn more discriminative representations to segment the mask
accurately; (3) a multi-factor constraints IoU minimization loss is introduced
for training the proposed model. The effectiveness of MT is evaluated on four
real-world scene text datasets, and it surpasses all the state-of-the-art
competitors to a large extent.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images.
Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection.
Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images [8.180563824325086]
We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
arXiv Detail & Related papers (2021-09-08T06:25:37Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - DGST : Discriminator Guided Scene Text detector [11.817428636084305]
This paper proposes a detector framework based on the conditional generative adversarial networks to improve the segmentation effect of scene text detection.
Experiments on standard datasets demonstrate that the proposed D GST brings noticeable gain and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-02-28T01:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.