BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text
Detection
- URL: http://arxiv.org/abs/2401.09997v1
- Date: Thu, 18 Jan 2024 14:13:46 GMT
- Title: BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text
Detection
- Authors: Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhao
- Abstract summary: We propose a novel arbitrary shape scene text detector through boundary points dynamic optimization(BPDO)
Model is designed with a text aware module (TAM) and a boundary point dynamic optimization module (DOM)
Experiments on CTW-1500, Total-Text, and MSRA-TD500 datasets show that the model proposed in this paper achieves a performance better than or comparable to the state-of-the-art algorithm.
- Score: 19.574306663095243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Arbitrary shape scene text detection is of great importance in scene
understanding tasks. Due to the complexity and diversity of text in natural
scenes, existing scene text algorithms have limited accuracy for detecting
arbitrary shape text. In this paper, we propose a novel arbitrary shape scene
text detector through boundary points dynamic optimization(BPDO). The proposed
model is designed with a text aware module (TAM) and a boundary point dynamic
optimization module (DOM). Specifically, the model designs a text aware module
based on segmentation to obtain boundary points describing the central region
of the text by extracting a priori information about the text region. Then,
based on the idea of deformable attention, it proposes a dynamic optimization
model for boundary points, which gradually optimizes the exact position of the
boundary points based on the information of the adjacent region of each
boundary point. Experiments on CTW-1500, Total-Text, and MSRA-TD500 datasets
show that the model proposed in this paper achieves a performance that is
better than or comparable to the state-of-the-art algorithm, proving the
effectiveness of the model.
Related papers
- Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - Text Region Multiple Information Perception Network for Scene Text
Detection [19.574306663095243]
This paper proposes a plug-and-play module called the Region Multiple Information Perception Module (RMIPM) to enhance the detection performance of segmentation-based algorithms.
Specifically, we design an improved module that can perceive various types of information about scene text regions, such as text foreground classification maps, distance maps, direction maps, etc.
arXiv Detail & Related papers (2024-01-18T14:36:51Z) - Adaptive Segmentation Network for Scene Text Detection [0.0]
We propose to automatically learn the discriminate segmentation threshold, which distinguishes text pixels from background pixels for segmentation-based scene text detectors.
Besides, we design a Global-information Enhanced Feature Pyramid Network (GE-FPN) for capturing text instances with macro size and extreme aspect ratios.
Finally, together with the proposed threshold learning strategy and text detection structure, we design an Adaptive Network (ASNet) for scene text detection.
arXiv Detail & Related papers (2023-07-27T17:37:56Z) - Towards Robust Scene Text Image Super-resolution via Explicit Location
Enhancement [59.66539728681453]
Scene text image super-resolution (STISR) aims to improve image quality while boosting downstream scene text recognition accuracy.
Most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process.
We propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution.
arXiv Detail & Related papers (2023-07-19T05:08:47Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene
Text Detection [15.230957275277762]
We propose a scene text detector named Deformable Kernel Expansion (DKE)
DKE employs a segmentation module to segment the shrunken text region as the text kernel, then expands the text kernel contour to obtain text boundary.
Experiments on CTW1500, Total-Text, MSRA-TD500, and ICDAR2015 demonstrate that DKE achieves a good tradeoff between accuracy and efficiency in scene text detection.
arXiv Detail & Related papers (2023-03-28T05:18:58Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection [18.491440228386313]
We propose a novel adaptive boundary proposal network for arbitrary shape text detection.
Our method can learn to directly produce accurate boundary for arbitrary shape text without any post-processing.
arXiv Detail & Related papers (2021-07-27T08:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.