Text Spotting Transformers
- URL: http://arxiv.org/abs/2204.01918v1
- Date: Tue, 5 Apr 2022 01:05:31 GMT
- Title: Text Spotting Transformers
- Authors: Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu
- Abstract summary: TESTR builds upon a single encoder and dual decoders for the joint text-box control point regression and character recognition.
We show our canonical representation of control points suitable for text instances in both Bezier curve and annotations.
In addition, we design a bounding-box guided detection (box-to-polygon) process.
- Score: 29.970268691631333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present TExt Spotting TRansformers (TESTR), a generic
end-to-end text spotting framework using Transformers for text detection and
recognition in the wild. TESTR builds upon a single encoder and dual decoders
for the joint text-box control point regression and character recognition.
Other than most existing literature, our method is free from Region-of-Interest
operations and heuristics-driven post-processing procedures; TESTR is
particularly effective when dealing with curved text-boxes where special cares
are needed for the adaptation of the traditional bounding-box representations.
We show our canonical representation of control points suitable for text
instances in both Bezier curve and polygon annotations. In addition, we design
a bounding-box guided polygon detection (box-to-polygon) process. Experiments
on curved and arbitrarily shaped datasets demonstrate state-of-the-art
performances of the proposed TESTR algorithm.
Related papers
- EAFormer: Scene Text Segmentation with Edge-Aware Transformers [56.15069996649572]
Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts.
We propose Edge-Aware Transformers, EAFormer, to segment texts more accurately, especially at the edge of texts.
arXiv Detail & Related papers (2024-07-24T06:00:33Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning [41.56134008044702]
Box is a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models.
Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training.
arXiv Detail & Related papers (2022-07-25T06:58:45Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - Fourier Contour Embedding for Arbitrary-Shaped Text Detection [47.737805731529455]
We propose a novel method to represent arbitrary shaped text contours as compact signatures.
We show that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes.
Our FCENet is superior to the state-of-the-art (SOTA) methods on CTW1500 and Total-Text.
arXiv Detail & Related papers (2021-04-21T10:21:57Z) - Detection and Rectification of Arbitrary Shaped Scene Texts by using
Text Keypoints and Links [38.71967078941593]
Mask-guided multi-task network detects and rectifies scene texts of arbitrary shapes reliably.
Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately.
Scene texts can be located and rectified by linking up the associated landmark points.
arXiv Detail & Related papers (2021-03-01T06:13:51Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.