3D Rendering Framework for Data Augmentation in Optical Character
Recognition
- URL: http://arxiv.org/abs/2209.14970v1
- Date: Tue, 27 Sep 2022 19:31:23 GMT
- Title: 3D Rendering Framework for Data Augmentation in Optical Character
Recognition
- Authors: Andreas Spruck, Maximiliane Hawesch, Anatol Maier, Christian Riess,
J\"urgen Seiler, Andr\'e Kaup
- Abstract summary: We propose a data augmentation framework for Optical Character Recognition (OCR)
The proposed framework is able to synthesize new viewing angles and illumination scenarios.
We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset.
- Score: 8.641647607173864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a data augmentation framework for Optical Character
Recognition (OCR). The proposed framework is able to synthesize new viewing
angles and illumination scenarios, effectively enriching any available OCR
dataset. Its modular structure allows to be modified to match individual user
requirements. The framework enables to comfortably scale the enlargement factor
of the available dataset. Furthermore, the proposed method is not restricted to
single frame OCR but can also be applied to video OCR. We demonstrate the
performance of our framework by augmenting a 15% subset of the common Brno
Mobile OCR dataset. Our proposed framework is capable of leveraging the
performance of OCR applications especially for small datasets. Applying the
proposed method, improvements of up to 2.79 percentage points in terms of
Character Error Rate (CER), and up to 7.88 percentage points in terms of Word
Error Rate (WER) are achieved on the subset. Especially the recognition of
challenging text lines can be improved. The CER may be decreased by up to 14.92
percentage points and the WER by up to 18.19 percentage points for this class.
Moreover, we are able to achieve smaller error rates when training on the 15%
subset augmented with the proposed method than on the original non-augmented
full dataset.
Related papers
- Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification [54.96876797812238]
We present a novel CrOss-moDal nEighbor Representation(CODER) based on the distance structure between images and their neighbor texts.
The key to construct a high-quality CODER lies in how to create a vast amount of high-quality and diverse texts to match with images.
Experiment results across various datasets and models confirm CODER's effectiveness.
arXiv Detail & Related papers (2024-04-27T02:04:36Z) - DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models.
We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z) - LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - Enhancing OCR Performance through Post-OCR Models: Adopting Glyph
Embedding for Improved Correction [0.0]
The novelty of our approach lies in embedding the OCR output using CharBERT and our unique embedding technique, capturing the visual characteristics of characters.
Our findings show that post-OCR correction effectively addresses deficiencies in inferior OCR models, and glyph embedding enables the model to achieve superior results.
arXiv Detail & Related papers (2023-08-29T12:41:50Z) - RBSR: Efficient and Flexible Recurrent Network for Burst
Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images.
In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z) - Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned
Receipt Images [0.07673339435080445]
We propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end.
Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks.
In our experiments, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate.
arXiv Detail & Related papers (2022-12-11T15:45:26Z) - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text
Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework.
It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency.
The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z) - TNCR: Table Net Detection and Classification Dataset [62.997667081978825]
TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.
We have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.
We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification, and structure recognition.
arXiv Detail & Related papers (2021-06-19T10:48:58Z) - Unknown-box Approximation to Improve Optical Character Recognition
Performance [7.805544279853116]
A novel approach is presented for creating a customized preprocessor for a given OCR engine.
Experiments with two datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR up to 46% from the baseline.
arXiv Detail & Related papers (2021-05-17T16:09:15Z) - Light Field Reconstruction Using Convolutional Network on EPI and
Extended Applications [78.63280020581662]
A novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views.
We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-24T08:16:32Z) - On-Device Text Image Super Resolution [0.0]
We present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence.
The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling but also runs with an average inference time of 11.7 ms per image.
We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%.
arXiv Detail & Related papers (2020-11-20T07:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.