PP-OCR: A Practical Ultra Lightweight OCR System
- URL: http://arxiv.org/abs/2009.09941v3
- Date: Thu, 15 Oct 2020 14:21:53 GMT
- Title: PP-OCR: A Practical Ultra Lightweight OCR System
- Authors: Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou,
Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, Haoshuang Wang
- Abstract summary: We propose a practical ultra lightweight OCR system, i.e., PP-OCR.
The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols.
- Score: 8.740684949994664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Optical Character Recognition (OCR) systems have been widely used in
various of application scenarios, such as office automation (OA) systems,
factory automations, online educations, map productions etc. However, OCR is
still a challenging task due to the various of text appearances and the demand
of computational efficiency. In this paper, we propose a practical ultra
lightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR is
only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63
alphanumeric symbols, respectively. We introduce a bag of strategies to either
enhance the model ability or reduce the model size. The corresponding ablation
experiments with the real data are also provided. Meanwhile, several
pre-trained models for the Chinese and English recognition are released,
including a text detector (97K images are used), a direction classifier (600K
images are used) as well as a text recognizer (17.9M images are used). Besides,
the proposed PP-OCR are also verified in several other language recognition
tasks, including French, Korean, Japanese and German. All of the above
mentioned models are open-sourced and the codes are available in the GitHub
repository, i.e., https://github.com/PaddlePaddle/PaddleOCR.
Related papers
- LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - EfficientOCR: An Extensible, Open-Source Package for Efficiently
Digitizing World Knowledge [1.8434042562191815]
EffOCR is a novel open-source optical character recognition (OCR) package.
It meets both the computational and sample efficiency requirements for liberating texts at scale.
EffOCR is cheap and sample efficient to train, as the model only needs to learn characters' visual appearance and not how they are used in sequence to form language.
arXiv Detail & Related papers (2023-10-16T04:20:16Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - Transferring General Multimodal Pretrained Models to Text Recognition [46.33867696799362]
We recast text recognition as image captioning and directly transfer a unified vision-language pretrained model to the end task.
We construct an OCR pipeline with OFA-OCR, and we demonstrate that it can achieve competitive performance with the product-level API.
arXiv Detail & Related papers (2022-12-19T08:30:42Z) - PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR
System [11.622321298214043]
PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2.
Experiments on real data show that the hmean of PP-OCRv3 is 5% higher than PP-OCRv2 under comparable inference speed.
arXiv Detail & Related papers (2022-06-07T04:33:50Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System [9.376162696601238]
We introduce bag of tricks to train a better text detector and a better text recognizer.
Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost.
arXiv Detail & Related papers (2021-09-07T15:24:40Z) - An end-to-end Optical Character Recognition approach for
ultra-low-resolution printed text images [0.0]
We present a novel method for performing optical character recognition (OCR) on low-resolution images.
This approach is inspired from our understanding of the human visual system, and builds on established neural networks for performing OCR.
We achieve a mean character level accuracy (CLA) of 99.7% and word level accuracy (WLA) of 98.9% across a set of about 1000 pages of 60 dpi text.
arXiv Detail & Related papers (2021-05-10T17:08:06Z) - An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.