Kernel Proposal Network for Arbitrary Shape Text Detection
- URL: http://arxiv.org/abs/2203.06410v2
- Date: Tue, 20 Jun 2023 03:18:52 GMT
- Title: Kernel Proposal Network for Arbitrary Shape Text Detection
- Authors: Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chun Yang, Xu-Cheng Yin
- Abstract summary: We propose an innovative Kernel Proposal Network (dubbed KPN) for arbitrary shape text detection.
The proposed KPN can separate neighboring text instances by classifying different texts into instance-independent feature maps.
Our work is the first to introduce the dynamic convolution kernel strategy to efficiently and effectively tackle the adhesion problem of neighboring text instances in text detection.
- Score: 18.561812622368763
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Segmentation-based methods have achieved great success for arbitrary shape
text detection. However, separating neighboring text instances is still one of
the most challenging problems due to the complexity of texts in scene images.
In this paper, we propose an innovative Kernel Proposal Network (dubbed KPN)
for arbitrary shape text detection. The proposed KPN can separate neighboring
text instances by classifying different texts into instance-independent feature
maps, meanwhile avoiding the complex aggregation process existing in
segmentation-based arbitrary shape text detection methods. To be concrete, our
KPN will predict a Gaussian center map for each text image, which will be used
to extract a series of candidate kernel proposals (i.e., dynamic convolution
kernel) from the embedding feature maps according to their corresponding
keypoint positions. To enforce the independence between kernel proposals, we
propose a novel orthogonal learning loss (OLL) via orthogonal constraints.
Specifically, our kernel proposals contain important self-information learned
by network and location information by position embedding. Finally, kernel
proposals will individually convolve all embedding feature maps for generating
individual embedded maps of text instances. In this way, our KPN can
effectively separate neighboring text instances and improve the robustness
against unclear boundaries. To our knowledge, our work is the first to
introduce the dynamic convolution kernel strategy to efficiently and
effectively tackle the adhesion problem of neighboring text instances in text
detection. Experimental results on challenging datasets verify the impressive
performance and efficiency of our method. The code and model are available at
https://github.com/GXYM/KPN.
Related papers
- Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera [31.180352896153682]
We propose an effective spotlight text detector (STD) for scene texts.
It consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM)
Our STD is superior to existing state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2024-09-25T11:19:09Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images [8.180563824325086]
We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
arXiv Detail & Related papers (2021-09-08T06:25:37Z) - CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection [19.69057252363207]
We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
arXiv Detail & Related papers (2021-07-13T09:34:18Z) - PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text [85.7020597476857]
We propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels.
As a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications.
arXiv Detail & Related papers (2021-05-02T07:04:30Z) - Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
Spotting [71.6244869235243]
Most arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals.
Our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise.
arXiv Detail & Related papers (2020-07-18T17:25:50Z) - All you need is a second look: Towards Tighter Arbitrary shape text
detection [80.85188469964346]
Long curve text instances tend to be fragmented because of the limited receptive field size of CNN.
Simple representations using rectangle or quadrangle bounding boxes fall short when dealing with more challenging arbitrary-shaped texts.
textitNASK reconstructs text instances with a more tighter representation using the predicted geometrical attributes.
arXiv Detail & Related papers (2020-04-26T17:03:41Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.