CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection
- URL: http://arxiv.org/abs/2107.05945v1
- Date: Tue, 13 Jul 2021 09:34:18 GMT
- Title: CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection
- Authors: Tao Sheng, Jie Chen, Zhouhui Lian
- Abstract summary: We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
- Score: 19.69057252363207
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scene text detection remains a grand challenge due to the variation in text
curvatures, orientations, and aspect ratios. One of the most intractable
problems is how to represent text instances of arbitrary shapes. Although many
state-of-the-art methods have been proposed to model irregular texts in a
flexible manner, most of them lose simplicity and robustness. Their complicated
post-processings and the regression under Dirac delta distribution undermine
the detection performance and the generalization ability. In this paper, we
propose an efficient text instance representation named CentripetalText (CT),
which decomposes text instances into the combination of text kernels and
centripetal shifts. Specifically, we utilize the centripetal shifts to
implement the pixel aggregation, which guide the external text pixels to the
internal text kernels. The relaxation operation is integrated into the dense
regression for centripetal shifts, allowing the correct prediction in a range,
not a specific value. The convenient reconstruction of the text contours and
the tolerance of the prediction errors in our method guarantee the high
detection accuracy and the fast inference speed respectively. Besides, we
shrink our text detector into a proposal generation module, namely
CentripetalText Proposal Network (CPN), replacing SPN in Mask TextSpotter v3
and producing more accurate proposals. To validate the effectiveness of our
designs, we conduct experiments on several commonly used scene text benchmarks,
including both curved and multi-oriented text datasets. For the task of scene
text detection, our approach achieves superior or competitive performance
compared to other existing methods, e.g., F-measure of 86.3% at 40.0 FPS on
Total-Text, F-measure of 86.1% at 34.8 FPS on MSRA-TD500, etc. For the task of
end-to-end scene text recognition, we outperform Mask TextSpotter v3 by 1.1% on
Total-Text.
Related papers
- EAFormer: Scene Text Segmentation with Edge-Aware Transformers [56.15069996649572]
Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts.
We propose Edge-Aware Transformers, EAFormer, to segment texts more accurately, especially at the edge of texts.
arXiv Detail & Related papers (2024-07-24T06:00:33Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask [19.269070203448187]
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2022-06-27T15:42:25Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text [85.7020597476857]
We propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels.
As a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications.
arXiv Detail & Related papers (2021-05-02T07:04:30Z) - Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
Spotting [71.6244869235243]
Most arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals.
Our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise.
arXiv Detail & Related papers (2020-07-18T17:25:50Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.