SPTS v2: Single-Point Scene Text Spotting
- URL: http://arxiv.org/abs/2301.01635v4
- Date: Sat, 2 Sep 2023 05:01:23 GMT
- Title: SPTS v2: Single-Point Scene Text Spotting
- Authors: Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang,
Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
- Abstract summary: New framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation.
Tests show SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters.
Experiments suggest a potential preference for single-point representation in scene text spotting.
- Score: 146.98118405786445
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: End-to-end scene text spotting has made significant progress due to its
intrinsic synergy between text detection and recognition. Previous methods
commonly regard manual annotations such as horizontal rectangles, rotated
rectangles, quadrangles, and polygons as a prerequisite, which are much more
expensive than using single-point. Our new framework, SPTS v2, allows us to
train high-performing text-spotting models using a single-point annotation.
SPTS v2 reserves the advantage of the auto-regressive Transformer with an
Instance Assignment Decoder (IAD) through sequentially predicting the center
points of all text instances inside the same predicting sequence, while with a
Parallel Recognition Decoder (PRD) for text recognition in parallel, which
significantly reduces the requirement of the length of the sequence. These two
decoders share the same parameters and are interactively connected with a
simple but effective information transmission process to pass the gradient and
information. Comprehensive experiments on various existing benchmark datasets
demonstrate the SPTS v2 can outperform previous state-of-the-art single-point
text spotters with fewer parameters while achieving 19$\times$ faster inference
speed. Within the context of our SPTS v2 framework, our experiments suggest a
potential preference for single-point representation in scene text spotting
when compared to other representations. Such an attempt provides a significant
opportunity for scene text spotting applications beyond the realms of existing
paradigms. Code is available at: https://github.com/Yuliang-Liu/SPTSv2.
Related papers
- DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text
Spotting [129.73247700864385]
DeepSolo is a simple detection transformer baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.
We introduce a text-matching criterion to deliver more accurate supervisory signals, thus enabling more efficient training.
arXiv Detail & Related papers (2022-11-19T19:06:22Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - SPTS: Single-Point Text Spotting [128.52900104146028]
We show that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.
We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task.
arXiv Detail & Related papers (2021-12-15T06:44:21Z) - Context-Free TextSpotter for Real-Time and Mobile End-to-End Text
Detection and Recognition [8.480710920894547]
We propose a text-spotting method that consists of simple convolutions and a few post-processes, named Context-Free TextSpotter.
Experiments using standard benchmarks show that Context-Free TextSpotter achieves real-time text spotting on a GPU with only three million parameters, which is the smallest and fastest among existing deep text spotters.
Our text spotter can run on a smartphone with affordable latency, which is valuable for building stand-alone OCR applications.
arXiv Detail & Related papers (2021-06-10T09:32:52Z) - ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
Spotting [108.93803186429017]
End-to-end text-spotting aims to integrate detection and recognition in a unified framework.
Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2)
Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation.
Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
arXiv Detail & Related papers (2021-05-08T07:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.