DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer
- URL: http://arxiv.org/abs/2207.04491v1
- Date: Sun, 10 Jul 2022 15:45:16 GMT
- Title: DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer
- Authors: Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, Dacheng Tao
- Abstract summary: Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
- Score: 94.35116535588332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Transformer-based methods, which predict polygon points or Bezier
curve control points to localize texts, are quite popular in scene text
detection. However, the used point label form implies the reading order of
humans, which affects the robustness of Transformer model. As for the model
architecture, the formulation of queries used in decoder has not been fully
explored by previous methods. In this paper, we propose a concise dynamic point
scene text detection Transformer network termed DPText-DETR, which directly
uses point coordinates as queries and dynamically updates them between decoder
layers. We point out a simple yet effective positional point label form to
tackle the side effect of the original one. Moreover, an Enhanced Factorized
Self-Attention module is designed to explicitly model the circular shape of
polygon point sequences beyond non-local attention. Extensive experiments prove
the training efficiency, robustness, and state-of-the-art performance on
various arbitrary shape scene text benchmarks. Beyond detector, we observe that
existing end-to-end spotters struggle to recognize inverse-like texts. To
evaluate their performance objectively and facilitate future research, we
propose an Inverse-Text test set containing 500 manually labeled images. The
code and Inverse-Text test set will be available at
https://github.com/ymy-k/DPText-DETR.
Related papers
- EAFormer: Scene Text Segmentation with Edge-Aware Transformers [56.15069996649572]
Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts.
We propose Edge-Aware Transformers, EAFormer, to segment texts more accurately, especially at the edge of texts.
arXiv Detail & Related papers (2024-07-24T06:00:33Z) - SPTS v2: Single-Point Scene Text Spotting [146.98118405786445]
New framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation.
Tests show SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters.
Experiments suggest a potential preference for single-point representation in scene text spotting.
arXiv Detail & Related papers (2023-01-04T14:20:14Z) - DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text
Spotting [129.73247700864385]
DeepSolo is a simple detection transformer baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.
We introduce a text-matching criterion to deliver more accurate supervisory signals, thus enabling more efficient training.
arXiv Detail & Related papers (2022-11-19T19:06:22Z) - Few Could Be Better Than All: Feature Sampling and Grouping for Scene
Text Detection [47.820683360286786]
We present a transformer-based architecture for scene text detection.
We first select a few representative features at all scales that are highly relevant to foreground text.
As each feature group corresponds to a text instance, its bounding box can be easily obtained without any post-processing operation.
arXiv Detail & Related papers (2022-03-29T04:02:31Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - CentripetalText: An Efficient Text Instance Representation for Scene
Text Detection [19.69057252363207]
We propose an efficient text instance representation named CentripetalText (CT)
CT decomposes text instances into the combination of text kernels and centripetal shifts.
For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods.
arXiv Detail & Related papers (2021-07-13T09:34:18Z) - Detection and Rectification of Arbitrary Shaped Scene Texts by using
Text Keypoints and Links [38.71967078941593]
Mask-guided multi-task network detects and rectifies scene texts of arbitrary shapes reliably.
Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately.
Scene texts can be located and rectified by linking up the associated landmark points.
arXiv Detail & Related papers (2021-03-01T06:13:51Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.