PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System
- URL: http://arxiv.org/abs/2109.03144v1
- Date: Tue, 7 Sep 2021 15:24:40 GMT
- Title: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System
- Authors: Yuning Du, Chenxia Li, Ruoyu Guo, Cheng Cui, Weiwei Liu, Jun Zhou, Bin
Lu, Yehua Yang, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma
- Abstract summary: We introduce bag of tricks to train a better text detector and a better text recognizer.
Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost.
- Score: 9.376162696601238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical Character Recognition (OCR) systems have been widely used in various
of application scenarios. Designing an OCR system is still a challenging task.
In previous work, we proposed a practical ultra lightweight OCR system (PP-OCR)
to balance the accuracy against the efficiency. In order to improve the
accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more
robust OCR system, i.e. PP-OCRv2. We introduce bag of tricks to train a better
text detector and a better text recognizer, which include Collaborative Mutual
Learning (CML), CopyPaste, Lightweight CPUNetwork (LCNet), Unified-Deep Mutual
Learning (U-DML) and Enhanced CTCLoss. Experiments on real data show that the
precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost.
It is also comparable to the server models of the PP-OCR which uses ResNet
series as backbones. All of the above mentioned models are open-sourced and the
code is available in the GitHub repository PaddleOCR which is powered by
PaddlePaddle.
Related papers
- SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition [77.28814034644287]
We propose SVTRv2, a CTC model that beats leading EDTRs in both accuracy and inference speed.
SVTRv2 introduces novel upgrades to handle text irregularity and utilize linguistic context.
We evaluate SVTRv2 in both standard and recent challenging benchmarks.
arXiv Detail & Related papers (2024-11-24T14:21:35Z) - SelfDRSC++: Self-Supervised Learning for Dual Reversed Rolling Shutter Correction [72.05587640928879]
We propose an enhanced Self-supervised learning framework for Dual reversed RS distortion Correction (SelfDRSC++)
We introduce a lightweight DRSC network that incorporates a bidirectional correlation matching block to refine the joint optimization of optical flows and corrected RS features.
To effectively train the DRSC network, we propose a self-supervised learning strategy that ensures cycle consistency between input and reconstructed dual reversed RS images.
arXiv Detail & Related papers (2024-08-21T08:17:22Z) - DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models.
We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset
Selection [59.77647907277523]
Adversarial contrast learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks.
ACL needs tremendous running time to generate the adversarial variants of all training data.
This paper proposes a robustness-aware coreset selection (RCS) method to speed up ACL.
arXiv Detail & Related papers (2023-02-08T03:20:14Z) - PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR
System [11.622321298214043]
PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2.
Experiments on real data show that the hmean of PP-OCRv3 is 5% higher than PP-OCRv2 under comparable inference speed.
arXiv Detail & Related papers (2022-06-07T04:33:50Z) - Digitizing Historical Balance Sheet Data: A Practitioner's Guide [0.30458514384586394]
This paper discusses how to successfully digitize large-scale historical micro-data by augmenting optical character recognition (OCR) engines with pre- and post-processing methods.
We apply them against two large balance sheet datasets and introduce "quipucamayoc", a Python package containing these methods in a unified framework.
arXiv Detail & Related papers (2022-03-31T19:18:38Z) - Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.
Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - PP-OCR: A Practical Ultra Lightweight OCR System [8.740684949994664]
We propose a practical ultra lightweight OCR system, i.e., PP-OCR.
The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols.
arXiv Detail & Related papers (2020-09-21T14:57:18Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.