PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System
- URL: http://arxiv.org/abs/2109.03144v1
- Date: Tue, 7 Sep 2021 15:24:40 GMT
- Title: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System
- Authors: Yuning Du, Chenxia Li, Ruoyu Guo, Cheng Cui, Weiwei Liu, Jun Zhou, Bin
Lu, Yehua Yang, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma
- Abstract summary: We introduce bag of tricks to train a better text detector and a better text recognizer.
Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost.
- Score: 9.376162696601238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical Character Recognition (OCR) systems have been widely used in various
of application scenarios. Designing an OCR system is still a challenging task.
In previous work, we proposed a practical ultra lightweight OCR system (PP-OCR)
to balance the accuracy against the efficiency. In order to improve the
accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more
robust OCR system, i.e. PP-OCRv2. We introduce bag of tricks to train a better
text detector and a better text recognizer, which include Collaborative Mutual
Learning (CML), CopyPaste, Lightweight CPUNetwork (LCNet), Unified-Deep Mutual
Learning (U-DML) and Enhanced CTCLoss. Experiments on real data show that the
precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost.
It is also comparable to the server models of the PP-OCR which uses ResNet
series as backbones. All of the above mentioned models are open-sourced and the
code is available in the GitHub repository PaddleOCR which is powered by
PaddlePaddle.
Related papers
- A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.
Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.
This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z) - DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models.
We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z) - Plug-and-Play Regulators for Image-Text Matching [76.28522712930668]
Exploiting fine-grained correspondence and visual-semantic alignments has shown great potential in image-text matching.
We develop two simple but quite effective regulators which efficiently encode the message output to automatically contextualize and aggregate cross-modal representations.
Experiments on MSCOCO and Flickr30K datasets validate that they can bring an impressive and consistent R@1 gain on multiple models.
arXiv Detail & Related papers (2023-03-23T15:42:05Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset
Selection [59.77647907277523]
Adversarial contrast learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks.
ACL needs tremendous running time to generate the adversarial variants of all training data.
This paper proposes a robustness-aware coreset selection (RCS) method to speed up ACL.
arXiv Detail & Related papers (2023-02-08T03:20:14Z) - PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR
System [11.622321298214043]
PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2.
Experiments on real data show that the hmean of PP-OCRv3 is 5% higher than PP-OCRv2 under comparable inference speed.
arXiv Detail & Related papers (2022-06-07T04:33:50Z) - Digitizing Historical Balance Sheet Data: A Practitioner's Guide [0.30458514384586394]
This paper discusses how to successfully digitize large-scale historical micro-data by augmenting optical character recognition (OCR) engines with pre- and post-processing methods.
We apply them against two large balance sheet datasets and introduce "quipucamayoc", a Python package containing these methods in a unified framework.
arXiv Detail & Related papers (2022-03-31T19:18:38Z) - Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.
Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - PP-OCR: A Practical Ultra Lightweight OCR System [8.740684949994664]
We propose a practical ultra lightweight OCR system, i.e., PP-OCR.
The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols.
arXiv Detail & Related papers (2020-09-21T14:57:18Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.