Fourier Contour Embedding for Arbitrary-Shaped Text Detection
- URL: http://arxiv.org/abs/2104.10442v2
- Date: Thu, 22 Apr 2021 06:03:58 GMT
- Title: Fourier Contour Embedding for Arbitrary-Shaped Text Detection
- Authors: Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin
and Wayne Zhang
- Abstract summary: We propose a novel method to represent arbitrary shaped text contours as compact signatures.
We show that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes.
Our FCENet is superior to the state-of-the-art (SOTA) methods on CTW1500 and Total-Text.
- Score: 47.737805731529455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the main challenges for arbitrary-shaped text detection is to design a
good text instance representation that allows networks to learn diverse text
geometry variances. Most of existing methods model text instances in image
spatial domain via masks or contour point sequences in the Cartesian or the
polar coordinate system. However, the mask representation might lead to
expensive post-processing, while the point sequence one may have limited
capability to model texts with highly-curved shapes. To tackle these problems,
we model text instances in the Fourier domain and propose one novel Fourier
Contour Embedding (FCE) method to represent arbitrary shaped text contours as
compact signatures. We further construct FCENet with a backbone, feature
pyramid networks (FPN) and a simple post-processing with the Inverse Fourier
Transformation (IFT) and Non-Maximum Suppression (NMS). Different from previous
methods, FCENet first predicts compact Fourier signatures of text instances,
and then reconstructs text contours via IFT and NMS during test. Extensive
experiments demonstrate that FCE is accurate and robust to fit contours of
scene texts even with highly-curved shapes, and also validate the effectiveness
and the good generalization of FCENet for arbitrary-shaped text detection.
Furthermore, experimental results show that our FCENet is superior to the
state-of-the-art (SOTA) methods on CTW1500 and Total-Text, especially on
challenging highly-curved text subset.
Related papers
- PBFormer: Capturing Complex Scene Text Shape with Polynomial Band
Transformer [28.52028534365144]
We present PBFormer, an efficient yet powerful scene text detector.
It unifies a transformer with a novel text shape shape Band (PB)
The simple operation can help detect small-scale texts.
arXiv Detail & Related papers (2023-08-29T03:41:27Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask [19.269070203448187]
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2022-06-27T15:42:25Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - Which and Where to Focus: A Simple yet Accurate Framework for
Arbitrary-Shaped Nearby Text Detection in Scene Images [8.180563824325086]
We propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection.
A One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths.
We also propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal.
arXiv Detail & Related papers (2021-09-08T06:25:37Z) - PAN++: Towards Efficient and Accurate End-to-End Spotting of
Arbitrarily-Shaped Text [85.7020597476857]
We propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels.
As a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications.
arXiv Detail & Related papers (2021-05-02T07:04:30Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.