Research on Multilingual Natural Scene Text Detection Algorithm
- URL: http://arxiv.org/abs/2312.11153v2
- Date: Fri, 5 Jan 2024 08:41:06 GMT
- Title: Research on Multilingual Natural Scene Text Detection Algorithm
- Authors: Tao Wang
- Abstract summary: We propose a multilingual text detection model to address the issues of low accuracy and high difficulty in detecting multilingual text in natural scenes.
We introduce the SFM Swin Transformer feature extraction network to enhance the model's robustness in detecting characters and fonts across different languages.
To overcome this, we propose a Global Semantic Branch, extracting and preserving global features for more effective text detection.
- Score: 4.514028820667202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural scene text detection is a significant challenge in computer vision,
with tremendous potential applications in multilingual, diverse, and complex
text scenarios. We propose a multilingual text detection model to address the
issues of low accuracy and high difficulty in detecting multilingual text in
natural scenes. In response to the challenges posed by multilingual text images
with multiple character sets and various font styles, we introduce the SFM Swin
Transformer feature extraction network to enhance the model's robustness in
detecting characters and fonts across different languages. Dealing with the
considerable variation in text scales and complex arrangements in natural scene
text images, we present the AS-HRFPN feature fusion network by incorporating an
Adaptive Spatial Feature Fusion module and a Spatial Pyramid Pooling module.
The feature fusion network improvements enhance the model's ability to detect
text sizes and orientations. Addressing diverse backgrounds and font variations
in multilingual scene text images is a challenge for existing methods. Limited
local receptive fields hinder detection performance. To overcome this, we
propose a Global Semantic Segmentation Branch, extracting and preserving global
features for more effective text detection, aligning with the need for
comprehensive information. In this study, we collected and built a real-world
multilingual natural scene text image dataset and conducted comprehensive
experiments and analyses. The experimental results demonstrate that the
proposed algorithm achieves an F-measure of 85.02\%, which is 4.71\% higher
than the baseline model. We also conducted extensive cross-dataset validation
on MSRA-TD500, ICDAR2017MLT, and ICDAR2015 datasets to verify the generality of
our approach. The code and dataset can be found at
https://github.com/wangmelon/CEMLT.
Related papers
- Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera [31.180352896153682]
We propose an effective spotlight text detector (STD) for scene texts.
It consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM)
Our STD is superior to existing state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2024-09-25T11:19:09Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding [91.30065932213758]
Large Multimodal Models (LMMs) have sparked a surge in research aimed at harnessing their remarkable reasoning abilities.
We propose TextCoT, a novel Chain-of-Thought framework for text-rich image understanding.
Our method is free of extra training, offering immediate plug-and-play functionality.
arXiv Detail & Related papers (2024-04-15T13:54:35Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Deformation Robust Text Spotting with Geometric Prior [5.639053898266709]
We develop a robust text spotting method (DR TextSpotter) to solve the recognition problem of complex deformation of characters in different fonts.
A graph convolution network is constructed to fuse the character features and landmark features, and then performs semantic reasoning to enhance the discrimination for different characters.
arXiv Detail & Related papers (2023-08-31T02:13:15Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Aggregated Text Transformer for Scene Text Detection [5.387121933662753]
We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism.
The multi-scale image representations are robust and contain rich information on text contents of various sizes.
The proposed method detects scene texts by representing each text instance as an individual binary mask, which is tolerant of curve texts and regions with dense instances.
arXiv Detail & Related papers (2022-11-25T09:47:34Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.