PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text
- URL: http://arxiv.org/abs/2105.00405v1
- Date: Sun, 2 May 2021 07:04:30 GMT
- Title: PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text
- Authors: Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang,
Tong Lu, Chunhua Shen
- Abstract summary: We propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels.
As a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications.
- Score: 85.7020597476857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene text detection and recognition have been well explored in the past few
years. Despite the progress, efficient and accurate end-to-end spotting of
arbitrarily-shaped text remains challenging. In this work, we propose an
end-to-end text spotting framework, termed PAN++, which can efficiently detect
and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the
kernel representation that reformulates a text line as a text kernel (central
region) surrounded by peripheral pixels. By systematically comparing with
existing scene text representations, we show that our kernel representation can
not only describe arbitrarily-shaped text but also well distinguish adjacent
text. Moreover, as a pixel-based representation, the kernel representation can
be predicted by a single fully convolutional network, which is very friendly to
real-time applications. Taking the advantages of the kernel representation, we
design a series of components as follows: 1) a computationally efficient
feature enhancement network composed of stacked Feature Pyramid Enhancement
Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel
Aggregation (PA); and 3) an efficient attention-based recognition head with
Masked RoI. Benefiting from the kernel representation and the tailored
components, our method achieves high inference speed while maintaining
competitive accuracy. Extensive experiments show the superiority of our method.
For example, the proposed PAN++ achieves an end-to-end text spotting F-measure
of 64.9 at 29.2 FPS on the Total-Text dataset, which significantly outperforms
the previous best method. Code will be available at: https://git.io/PAN.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Towards Robust Real-Time Scene Text Detection: From Semantic to Instance
Representation Learning [19.856492291263102]
We propose representation learning for real-time scene text detection.
For semantic representation learning, we propose global-dense semantic contrast (GDSC) and top-down modeling (TDM)
With the proposed GDSC and TDM, the encoder network learns stronger representation without introducing any parameters and computations during inference.
The proposed method achieves 87.2% F-measure with 48.2 FPS on Total-Text and 89.6% F-measure with 36.9 FPS on MSRA-TD500.
arXiv Detail & Related papers (2023-08-14T15:14:37Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection [13.679267531492062]
We propose a Context-aware and Boundary-guided Network (CBN) to tackle these problems.
In CBN, a basic text detector is firstly used to predict initial segmentation results.
Finally, we introduce a boundary-guided module to expand enhanced text kernels adaptively with only the pixels on the contours.
arXiv Detail & Related papers (2022-12-05T15:15:27Z) - CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection [19.69057252363207]
We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
arXiv Detail & Related papers (2021-07-13T09:34:18Z) - ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
Spotting [108.93803186429017]
End-to-end text-spotting aims to integrate detection and recognition in a unified framework.
Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2)
Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation.
Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
arXiv Detail & Related papers (2021-05-08T07:46:55Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.