Context-Free TextSpotter for Real-Time and Mobile End-to-End Text
Detection and Recognition
- URL: http://arxiv.org/abs/2106.05611v1
- Date: Thu, 10 Jun 2021 09:32:52 GMT
- Title: Context-Free TextSpotter for Real-Time and Mobile End-to-End Text
Detection and Recognition
- Authors: Ryota Yoshihashi, Tomohiro Tanaka, Kenji Doi, Takumi Fujino, and
Naoaki Yamashita
- Abstract summary: We propose a text-spotting method that consists of simple convolutions and a few post-processes, named Context-Free TextSpotter.
Experiments using standard benchmarks show that Context-Free TextSpotter achieves real-time text spotting on a GPU with only three million parameters, which is the smallest and fastest among existing deep text spotters.
Our text spotter can run on a smartphone with affordable latency, which is valuable for building stand-alone OCR applications.
- Score: 8.480710920894547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the deployment of scene-text spotting systems on mobile platforms,
lightweight models with low computation are preferable. In concept, end-to-end
(E2E) text spotting is suitable for such purposes because it performs text
detection and recognition in a single model. However, current state-of-the-art
E2E methods rely on heavy feature extractors, recurrent sequence modellings,
and complex shape aligners to pursue accuracy, which means their computations
are still heavy. We explore the opposite direction: How far can we go without
bells and whistles in E2E text spotting? To this end, we propose a
text-spotting method that consists of simple convolutions and a few
post-processes, named Context-Free TextSpotter. Experiments using standard
benchmarks show that Context-Free TextSpotter achieves real-time text spotting
on a GPU with only three million parameters, which is the smallest and fastest
among existing deep text spotters, with an acceptable transcription quality
degradation compared to heavier ones. Further, we demonstrate that our text
spotter can run on a smartphone with affordable latency, which is valuable for
building stand-alone OCR applications.
Related papers
- TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images.
Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection.
Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Video text tracking for dense and small text based on pp-yoloe-r and
sort algorithm [0.9137554315375919]
DSText is 1080 * 1920 and slicing the video frame into several areas will destroy the spatial correlation of text.
For text detection, we adopt the PP-YOLOE-R which is proven effective in small object detection.
For text detection, we use the sort algorithm for high inference speed.
arXiv Detail & Related papers (2023-03-31T05:40:39Z) - SPTS v2: Single-Point Scene Text Spotting [146.98118405786445]
New framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation.
Tests show SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters.
Experiments suggest a potential preference for single-point representation in scene text spotting.
arXiv Detail & Related papers (2023-01-04T14:20:14Z) - Decoupling Recognition from Detection: Single Shot Self-Reliant Scene
Text Spotter [34.09162878714425]
We propose the single shot Self-Reliant Scene Text Spotter (SRSTS)
We conduct text detection and recognition in parallel and bridge them by the shared positive anchor point.
Our method is able to recognize the text instances correctly even though the precise text boundaries are challenging to detect.
arXiv Detail & Related papers (2022-07-15T01:59:14Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization.
Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features.
A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z) - AE TextSpotter: Learning Visual and Linguistic Representation for
Ambiguous Text Spotting [98.08853679310603]
This work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter)
AE TextSpotter learns both visual and linguistic features to significantly reduce ambiguity in text detection.
To our knowledge, it is the first time to improve text detection by using a language model.
arXiv Detail & Related papers (2020-08-03T08:40:01Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.