TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask
- URL: http://arxiv.org/abs/2206.13381v1
- Date: Mon, 27 Jun 2022 15:42:25 GMT
- Title: TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask
- Authors: Yuchen Su, Zhiwen Shao, Yong Zhou, Fanrong Meng, Hancheng Zhu, Bing
Liu, and Rui Yao
- Abstract summary: Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
- Score: 19.269070203448187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Arbitrary-shaped scene text detection is a challenging task due to the
variety of text changes in font, size, color, and orientation. Most existing
regression based methods resort to regress the masks or contour points of text
regions to model the text instances. However, regressing the complete masks
requires high training complexity, and contour points are not sufficient to
capture the details of highly curved texts. To tackle the above limitations, we
propose a novel light-weight anchor-free text detection framework called
TextDCT, which adopts the discrete cosine transform (DCT) to encode the text
masks as compact vectors. Further, considering the imbalanced number of
training samples among pyramid layers, we only employ a single-level head for
top-down prediction. To model the multi-scale texts in a single-level head, we
introduce a novel positive sampling strategy by treating the shrunk text region
as positive samples, and design a feature awareness module (FAM) for
spatial-awareness and scale-awareness by fusing rich contextual information and
focusing on more significant features. Moreover, we propose a segmented
non-maximum suppression (S-NMS) method that can filter low-quality mask
regressions. Extensive experiments are conducted on four challenging datasets,
which demonstrate our TextDCT obtains competitive performance on both accuracy
and efficiency. Specifically, TextDCT achieves F-measure of 85.1 at 17.2 frames
per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text
datasets, respectively.
Related papers
- CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer [19.269070203448187]
We propose a novel arbitrary-shaped scene text detection framework named CT-Net by progressive contour regression with contour transformers.
CT-Net achieves F-measure of 86.1 at 11.2 frames per second (FPS) and F-measure of 87.8 at 10.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2023-07-25T08:00:40Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection [19.69057252363207]
We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
arXiv Detail & Related papers (2021-07-13T09:34:18Z) - RSCA: Real-time Segmentation-based Context-Aware Scene Text Detection [14.125634725954848]
We propose RSCA: a Real-time-based Context-Aware model for arbitrary-shaped scene text detection.
Based on these strategies, RSCA achieves state-of-the-art performance in both speed and accuracy, without complex label assignments or repeated feature aggregations.
arXiv Detail & Related papers (2021-05-26T18:43:17Z) - Fourier Contour Embedding for Arbitrary-Shaped Text Detection [47.737805731529455]
We propose a novel method to represent arbitrary shaped text contours as compact signatures.
We show that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes.
Our FCENet is superior to the state-of-the-art (SOTA) methods on CTW1500 and Total-Text.
arXiv Detail & Related papers (2021-04-21T10:21:57Z) - BOTD: Bold Outline Text Detector [85.33700624095181]
We propose a new one-stage text detector, termed as Bold Outline Text Detector (BOTD)
BOTD is able to process the arbitrary-shaped text with low model complexity.
Experimental results on three real-world benchmarks show the state-of-the-art performance of BOTD.
arXiv Detail & Related papers (2020-11-30T11:54:14Z) - Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text
Spotting [71.6244869235243]
Most arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals.
Our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise.
arXiv Detail & Related papers (2020-07-18T17:25:50Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.