Arbitrary Shape Text Detection using Transformers
- URL: http://arxiv.org/abs/2202.11221v1
- Date: Tue, 22 Feb 2022 22:36:29 GMT
- Title: Arbitrary Shape Text Detection using Transformers
- Authors: Zobeir Raisi, Georges Younes, and John Zelek
- Abstract summary: We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
- Score: 2.294014185517203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent text detection frameworks require several handcrafted components such
as anchor generation, non-maximum suppression (NMS), or multiple processing
stages (e.g. label generation) to detect arbitrarily shaped text images. In
contrast, we propose an end-to-end trainable architecture based on Detection
using Transformers (DETR), that outperforms previous state-of-the-art methods
in arbitrary-shaped text detection. At its core, our proposed method leverages
a bounding box loss function that accurately measures the arbitrary detected
text regions' changes in scale and aspect ratio. This is possible due to a
hybrid shape representation made from Bezier curves, that are further split
into piece-wise polygons. The proposed loss function is then a combination of a
generalized-split-intersection-over-union loss defined over the piece-wise
polygons and regularized by a Smooth-$\ln$ regression over the Bezier curve's
control points. We evaluate our proposed model using Total-Text and CTW-1500
datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for
multi-oriented text, and show that the proposed method outperforms the previous
state-of-the-art methods in arbitrary-shape text detection tasks.
Related papers
- CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer [19.269070203448187]
We propose a novel arbitrary-shaped scene text detection framework named CT-Net by progressive contour regression with contour transformers.
CT-Net achieves F-measure of 86.1 at 11.2 frames per second (FPS) and F-measure of 87.8 at 10.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2023-07-25T08:00:40Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform
Mask [19.269070203448187]
Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation.
We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors.
TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively.
arXiv Detail & Related papers (2022-06-27T15:42:25Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - Bidirectional Regression for Arbitrary-Shaped Text Detection [16.30976392505236]
This paper presents a novel text instance expression which integrates both foreground and background information into the pipeline.
A corresponding post-processing algorithm is also designed to sequentially combine the four prediction results and reconstruct the text instance accurately.
We evaluate our method on several challenging scene text benchmarks, including both curved and multi-oriented text datasets.
arXiv Detail & Related papers (2021-07-13T14:29:09Z) - RayNet: Real-time Scene Arbitrary-shape Text Detection with Multiple
Rays [84.15123599963239]
We propose a novel detection framework for arbitrary-shape text detection, termed as RayNet.
RayNet uses Center Point Set (CPS) and Ray Distance (RD) to fit text, where CPS is used to determine the text general position and the RD is combined with CPS to compute Ray Points (RP) to localize the text accurate shape.
RayNet achieves impressive performance on existing curved text dataset (CTW1500) and quadrangle text dataset (ICDAR2015)
arXiv Detail & Related papers (2021-04-11T03:03:23Z) - BOTD: Bold Outline Text Detector [85.33700624095181]
We propose a new one-stage text detector, termed as Bold Outline Text Detector (BOTD)
BOTD is able to process the arbitrary-shaped text with low model complexity.
Experimental results on three real-world benchmarks show the state-of-the-art performance of BOTD.
arXiv Detail & Related papers (2020-11-30T11:54:14Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.