Related papers: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

URL: http://arxiv.org/abs/2109.03144v1
Date: Tue, 7 Sep 2021 15:24:40 GMT
Title: PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System
Authors: Yuning Du, Chenxia Li, Ruoyu Guo, Cheng Cui, Weiwei Liu, Jun Zhou, Bin Lu, Yehua Yang, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma
Abstract summary: We introduce bag of tricks to train a better text detector and a better text recognizer. Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost.
Score: 9.376162696601238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optical Character Recognition (OCR) systems have been widely used in various of application scenarios. Designing an OCR system is still a challenging task. In previous work, we proposed a practical ultra lightweight OCR system (PP-OCR) to balance the accuracy against the efficiency. In order to improve the accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more robust OCR system, i.e. PP-OCRv2. We introduce bag of tricks to train a better text detector and a better text recognizer, which include Collaborative Mutual Learning (CML), CopyPaste, Lightweight CPUNetwork (LCNet), Unified-Deep Mutual Learning (U-DML) and Enhanced CTCLoss. Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost. It is also comparable to the server models of the PP-OCR which uses ResNet series as backbones. All of the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleOCR which is powered by PaddlePaddle.

Related papers

Cost-Aware Contrastive Routing for LLMs [57.30288453580456]
We introduce Cost-Spectrum Contrastive Routing (CSCR), a lightweight framework that maps both prompts and models into a shared embedding space.<n>CSCR consistently outperforms baselines, improving the accuracy-cost tradeoff by up to 25%.
arXiv Detail & Related papers (2025-08-17T20:16:44Z)
TFIC: End-to-End Text-Focused Image Compression for Coding for Machines [50.86328069558113]
We present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR) Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity.
arXiv Detail & Related papers (2025-03-25T09:36:13Z)
LMV-RPA: Large Model Voting-based Robotic Process Automation [0.0]
This paper introduces LMV-RPA, a Large Model Voting-based Robotic Process Automation system to enhance OCR. LMV-RPA integrates outputs from OCR engines such as Paddle OCR, Tesseract OCR, Easy OCR, and DocTR with Large Language Models. It achieves 99 percent accuracy in OCR tasks, surpassing baseline models with 94 percent, while reducing processing time by 80 percent.
arXiv Detail & Related papers (2024-12-23T20:28:22Z)
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition [77.28814034644287]
We propose SVTRv2, a CTC model that beats leading EDTRs in both accuracy and inference speed. SVTRv2 introduces novel upgrades to handle text irregularity and utilize linguistic context. We evaluate SVTRv2 in both standard and recent challenging benchmarks.
arXiv Detail & Related papers (2024-11-24T14:21:35Z)
SelfDRSC++: Self-Supervised Learning for Dual Reversed Rolling Shutter Correction [72.05587640928879]
We propose an enhanced Self-supervised learning framework for Dual reversed RS distortion Correction (SelfDRSC++) We introduce a lightweight DRSC network that incorporates a bidirectional correlation matching block to refine the joint optimization of optical flows and corrected RS features. To effectively train the DRSC network, we propose a self-supervised learning strategy that ensures cycle consistency between input and reconstructed dual reversed RS images.
arXiv Detail & Related papers (2024-08-21T08:17:22Z)
DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models. We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z)
User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%. Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z)
Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset Selection [59.77647907277523]
Adversarial contrast learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks. ACL needs tremendous running time to generate the adversarial variants of all training data. This paper proposes a robustness-aware coreset selection (RCS) method to speed up ACL.
arXiv Detail & Related papers (2023-02-08T03:20:14Z)
PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System [11.622321298214043]
PP-OCRv3 upgrades the text detection model and text recognition model in 9 aspects based on PP-OCRv2. Experiments on real data show that the hmean of PP-OCRv3 is 5% higher than PP-OCRv2 under comparable inference speed.
arXiv Detail & Related papers (2022-06-07T04:33:50Z)
Digitizing Historical Balance Sheet Data: A Practitioner's Guide [0.30458514384586394]
This paper discusses how to successfully digitize large-scale historical micro-data by augmenting optical character recognition (OCR) engines with pre- and post-processing methods. We apply them against two large balance sheet datasets and introduce "quipucamayoc", a Python package containing these methods in a unified framework.
arXiv Detail & Related papers (2022-03-31T19:18:38Z)
Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework. Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
PP-OCR: A Practical Ultra Lightweight OCR System [8.740684949994664]
We propose a practical ultra lightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols.
arXiv Detail & Related papers (2020-09-21T14:57:18Z)
A Generic Network Compression Framework for Sequential Recommender Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations. We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed. By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.