PP-OCR: A Practical Ultra Lightweight OCR System
        - URL: http://arxiv.org/abs/2009.09941v3
- Date: Thu, 15 Oct 2020 14:21:53 GMT
- Title: PP-OCR: A Practical Ultra Lightweight OCR System
- Authors: Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou,
  Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, Haoshuang Wang
- Abstract summary: We propose a practical ultra lightweight OCR system, i.e., PP-OCR.
The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols.
- Score: 8.740684949994664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   The Optical Character Recognition (OCR) systems have been widely used in
various of application scenarios, such as office automation (OA) systems,
factory automations, online educations, map productions etc. However, OCR is
still a challenging task due to the various of text appearances and the demand
of computational efficiency. In this paper, we propose a practical ultra
lightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR is
only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63
alphanumeric symbols, respectively. We introduce a bag of strategies to either
enhance the model ability or reduce the model size. The corresponding ablation
experiments with the real data are also provided. Meanwhile, several
pre-trained models for the Chinese and English recognition are released,
including a text detector (97K images are used), a direction classifier (600K
images are used) as well as a text recognizer (17.9M images are used). Besides,
the proposed PP-OCR are also verified in several other language recognition
tasks, including French, Korean, Japanese and German. All of the above
mentioned models are open-sourced and the codes are available in the GitHub
repository, i.e., https://github.com/PaddlePaddle/PaddleOCR.
 
      
        Related papers
        - PsOCR: Benchmarking Large Multimodal Models for Optical Character   Recognition in Low-resource Pashto Language [2.1540520105079697]
 We develop a synthetic Pashto OCR dataset, PsOCR, consisting of one million images annotated with bounding boxes at word, line, and document levels.<n>PsOCR covers variations across 1,000 unique font families, colors, image sizes, and layouts.<n>A benchmark subset of 10K images was selected to evaluate the performance of several LMMs, including seven open-source models.<n> Gemini achieves the best performance among all models, whereas among open-source models, Qwen-7B stands out.
 arXiv  Detail & Related papers  (2025-05-15T07:58:38Z)
- KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and   Document Understanding [24.9462694200992]
 KITAB-Bench is a comprehensive Arabic OCR benchmark that fills the gaps in current evaluation systems.
Modern vision-language models (such as GPT-4, Gemini, and Qwen) outperform traditional OCR approaches by an average of 60% in Character Error Rate (CER)
This work establishes a rigorous evaluation framework that can drive improvements in Arabic document analysis methods.
 arXiv  Detail & Related papers  (2025-02-20T18:41:23Z)
- CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating   Large Multimodal Models in Literacy [50.78228433498211]
 CC-OCR comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction.
It includes 39 subsets with 7,058 full annotated images, of which 41% are sourced from real applications, and released for the first time.
We evaluate nine prominent LMMs and reveal both the strengths and weaknesses of these models, particularly in text grounding, multi-orientation, and hallucination of repetition.
 arXiv  Detail & Related papers  (2024-12-03T07:03:25Z)
- LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
 We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
 arXiv  Detail & Related papers  (2024-03-04T15:34:12Z)
- EfficientOCR: An Extensible, Open-Source Package for Efficiently
  Digitizing World Knowledge [1.8434042562191815]
 EffOCR is a novel open-source optical character recognition (OCR) package.
It meets both the computational and sample efficiency requirements for liberating texts at scale.
EffOCR is cheap and sample efficient to train, as the model only needs to learn characters' visual appearance and not how they are used in sequence to form language.
 arXiv  Detail & Related papers  (2023-10-16T04:20:16Z)
- Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
  Image-IDS Aligning [61.34060587461462]
 We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
 arXiv  Detail & Related papers  (2023-09-03T05:33:16Z)
- OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
 We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
 OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
 arXiv  Detail & Related papers  (2023-05-13T11:28:37Z)
- User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
 We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
 arXiv  Detail & Related papers  (2023-02-26T21:41:15Z)
- Transferring General Multimodal Pretrained Models to Text Recognition [46.33867696799362]
 We recast text recognition as image captioning and directly transfer a unified vision-language pretrained model to the end task.
We construct an OCR pipeline with OFA-OCR, and we demonstrate that it can achieve competitive performance with the product-level API.
 arXiv  Detail & Related papers  (2022-12-19T08:30:42Z)
- PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR
  System [11.622321298214043]
 PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2.
Experiments on real data show that the hmean of PP-OCRv3 is 5% higher than PP-OCRv2 under comparable inference speed.
 arXiv  Detail & Related papers  (2022-06-07T04:33:50Z)
- Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
 Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
 arXiv  Detail & Related papers  (2021-11-04T04:39:02Z)
- PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System [9.376162696601238]
 We introduce bag of tricks to train a better text detector and a better text recognizer.
Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost.
 arXiv  Detail & Related papers  (2021-09-07T15:24:40Z)
- An end-to-end Optical Character Recognition approach for
  ultra-low-resolution printed text images [0.0]
 We present a novel method for performing optical character recognition (OCR) on low-resolution images.
This approach is inspired from our understanding of the human visual system, and builds on established neural networks for performing OCR.
We achieve a mean character level accuracy (CLA) of 99.7% and word level accuracy (WLA) of 98.9% across a set of about 1000 pages of 60 dpi text.
 arXiv  Detail & Related papers  (2021-05-10T17:08:06Z)
- An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
 This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
 arXiv  Detail & Related papers  (2020-09-18T22:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.