MOST: A Multi-Oriented Scene Text Detector with Localization Refinement
- URL: http://arxiv.org/abs/2104.01070v2
- Date: Mon, 5 Apr 2021 08:52:47 GMT
- Title: MOST: A Multi-Oriented Scene Text Detector with Localization Refinement
- Authors: Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing
Cheng, Cong Yao, Yongpan Wang, Xiang Bai
- Abstract summary: We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
- Score: 67.35280008722255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past few years, the field of scene text detection has progressed
rapidly that modern text detectors are able to hunt text in various challenging
scenarios. However, they might still fall short when handling text instances of
extreme aspect ratios and varying scales. To tackle such difficulties, we
propose in this paper a new algorithm for scene text detection, which puts
forward a set of strategies to significantly improve the quality of text
localization. Specifically, a Text Feature Alignment Module (TFAM) is proposed
to dynamically adjust the receptive fields of features based on initial raw
detections; a Position-Aware Non-Maximum Suppression (PA-NMS) module is devised
to selectively concentrate on reliable raw detections and exclude unreliable
ones; besides, we propose an Instance-wise IoU loss for balanced training to
deal with text instances of different scales. An extensive ablation study
demonstrates the effectiveness and superiority of the proposed strategies. The
resulting text detection system, which integrates the proposed strategies with
a leading scene text detector EAST, achieves state-of-the-art or competitive
performance on various standard benchmarks for text detection while keeping a
fast running speed.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - Attention-based Feature Decomposition-Reconstruction Network for Scene
Text Detection [20.85468268945721]
We propose attention-based feature decomposition-reconstruction network for scene text detection.
We use contextual information and low-level feature to enhance the performance of segmentation-based text detector.
Experiments have been conducted on two public benchmark datasets and results show that our proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-11-29T06:15:25Z) - ARTS: Eliminating Inconsistency between Text Detection and Recognition
with Auto-Rectification Text Spotter [37.86206423441885]
We present a simple yet robust end-to-end text spotting framework, termed Auto-Rectification Text Spotter (ARTS)
Our method achieves 77.1% end-to-end text spotting F-measure on Total-Text at a competitive speed of 10.5 FPS.
arXiv Detail & Related papers (2021-10-20T06:53:44Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images [8.180563824325086]
We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
arXiv Detail & Related papers (2021-09-08T06:25:37Z) - MT: Multi-Perspective Feature Learning Network for Scene Text Detection [9.282254601960613]
A light-weight detection framework is designed to speed up the inference process while keeping high detection accuracy.
A multi-perspective feature module is proposed to learn more discriminative representations to segment the mask accurately.
The effectiveness of MT is evaluated on four real-world scene text datasets.
arXiv Detail & Related papers (2021-05-12T06:41:34Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.