Related papers: Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

URL: http://arxiv.org/abs/2106.05611v1
Date: Thu, 10 Jun 2021 09:32:52 GMT
Title: Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition
Authors: Ryota Yoshihashi, Tomohiro Tanaka, Kenji Doi, Takumi Fujino, and Naoaki Yamashita
Abstract summary: We propose a text-spotting method that consists of simple convolutions and a few post-processes, named Context-Free TextSpotter. Experiments using standard benchmarks show that Context-Free TextSpotter achieves real-time text spotting on a GPU with only three million parameters, which is the smallest and fastest among existing deep text spotters. Our text spotter can run on a smartphone with affordable latency, which is valuable for building stand-alone OCR applications.
Score: 8.480710920894547
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the deployment of scene-text spotting systems on mobile platforms, lightweight models with low computation are preferable. In concept, end-to-end (E2E) text spotting is suitable for such purposes because it performs text detection and recognition in a single model. However, current state-of-the-art E2E methods rely on heavy feature extractors, recurrent sequence modellings, and complex shape aligners to pursue accuracy, which means their computations are still heavy. We explore the opposite direction: How far can we go without bells and whistles in E2E text spotting? To this end, we propose a text-spotting method that consists of simple convolutions and a few post-processes, named Context-Free TextSpotter. Experiments using standard benchmarks show that Context-Free TextSpotter achieves real-time text spotting on a GPU with only three million parameters, which is the smallest and fastest among existing deep text spotters, with an acceptable transcription quality degradation compared to heavier ones. Further, we demonstrate that our text spotter can run on a smartphone with affordable latency, which is valuable for building stand-alone OCR applications.

Related papers

GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking [77.0306273129475]
Video text spotting (VTS) extends image text spotting (ITS) by adding text tracking.<n>Despite progress in VTS, existing methods still fall short of the performance seen in ITS.<n>GoMatching++ transforms an off-the-shelf image text spotter into a video specialist.
arXiv Detail & Related papers (2025-05-28T11:02:45Z)
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance [6.93632116687419]
Local semantic knowledge not only includes text content but also spatial information in the right reading order. We propose the Local Semantics Guided scene text Spotter (LSGSpotter), which auto-regressively decodes the position and content of characters guided by the local semantics. LSGSpotter achieves the arbitrary reading order spotting task without the limitation of sophisticated detection.
arXiv Detail & Related papers (2024-12-13T14:20:43Z)
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images. Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection. Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Video text tracking for dense and small text based on pp-yoloe-r and sort algorithm [0.9137554315375919]
DSText is 1080 * 1920 and slicing the video frame into several areas will destroy the spatial correlation of text. For text detection, we adopt the PP-YOLOE-R which is proven effective in small object detection. For text detection, we use the sort algorithm for high inference speed.
arXiv Detail & Related papers (2023-03-31T05:40:39Z)
SPTS v2: Single-Point Scene Text Spotting [146.98118405786445]
New framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. Tests show SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters. Experiments suggest a potential preference for single-point representation in scene text spotting.
arXiv Detail & Related papers (2023-01-04T14:20:14Z)
Decoupling Recognition from Detection: Single Shot Self-Reliant Scene Text Spotter [34.09162878714425]
We propose the single shot Self-Reliant Scene Text Spotter (SRSTS) We conduct text detection and recognition in parallel and bridge them by the shared positive anchor point. Our method is able to recognize the text instances correctly even though the precise text boundaries are challenging to detect.
arXiv Detail & Related papers (2022-07-15T01:59:14Z)
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter. Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism. The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z)
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting. TTS is trained with both fully- and weakly-supervised settings. trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z)
MOST: A Multi-Oriented Scene Text Detector with Localization Refinement [67.35280008722255]
We propose a new algorithm for scene text detection, which puts forward a set of strategies to significantly improve the quality of text localization. Specifically, a Text Feature Alignment Module (TFAM) is proposed to dynamically adjust the receptive fields of features. A Position-Aware Non-Maximum Suppression (PA-NMS) module is devised to exclude unreliable ones.
arXiv Detail & Related papers (2021-04-02T14:34:41Z)
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting [98.08853679310603]
This work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter) AE TextSpotter learns both visual and linguistic features to significantly reduce ambiguity in text detection. To our knowledge, it is the first time to improve text detection by using a language model.
arXiv Detail & Related papers (2020-08-03T08:40:01Z)
Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron. It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.