3D Rendering Framework for Data Augmentation in Optical Character
Recognition
- URL: http://arxiv.org/abs/2209.14970v1
- Date: Tue, 27 Sep 2022 19:31:23 GMT
- Title: 3D Rendering Framework for Data Augmentation in Optical Character
Recognition
- Authors: Andreas Spruck, Maximiliane Hawesch, Anatol Maier, Christian Riess,
J\"urgen Seiler, Andr\'e Kaup
- Abstract summary: We propose a data augmentation framework for Optical Character Recognition (OCR)
The proposed framework is able to synthesize new viewing angles and illumination scenarios.
We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset.
- Score: 8.641647607173864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a data augmentation framework for Optical Character
Recognition (OCR). The proposed framework is able to synthesize new viewing
angles and illumination scenarios, effectively enriching any available OCR
dataset. Its modular structure allows to be modified to match individual user
requirements. The framework enables to comfortably scale the enlargement factor
of the available dataset. Furthermore, the proposed method is not restricted to
single frame OCR but can also be applied to video OCR. We demonstrate the
performance of our framework by augmenting a 15% subset of the common Brno
Mobile OCR dataset. Our proposed framework is capable of leveraging the
performance of OCR applications especially for small datasets. Applying the
proposed method, improvements of up to 2.79 percentage points in terms of
Character Error Rate (CER), and up to 7.88 percentage points in terms of Word
Error Rate (WER) are achieved on the subset. Especially the recognition of
challenging text lines can be improved. The CER may be decreased by up to 14.92
percentage points and the WER by up to 18.19 percentage points for this class.
Moreover, we are able to achieve smaller error rates when training on the 15%
subset augmented with the proposed method than on the original non-augmented
full dataset.
Related papers
- DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation [0.0]
This paper introduces a novel end-to-end framework that combines ResNet and Vision Transformer backbones with advanced methodologies, including Deformable Convolutions, Retrieval-Augmented Generation, and Conditional Random Fields (CRF)<n>Experiments conducted on six benchmark datasets establish a new state-of-the-art for text recognition, demonstrating the robustness of the approach across diverse and challenging datasets.
arXiv Detail & Related papers (2025-05-07T07:06:04Z) - CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models [0.0]
This paper introduces Context Leveraging OCR Correction (CLOCR-C)
It uses the infilling and context-adaptive abilities of transformer-based language models (LMs) to improve OCR quality.
The study aims to determine if LMs can perform post-OCR correction, improve downstream NLP tasks, and the value of providing socio-cultural context as part of the correction process.
arXiv Detail & Related papers (2024-08-30T17:26:05Z) - Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation [82.95830628372845]
This paper introduces a collaborative vision-text optimizing mechanism within the Open-Vocabulary encoder (OVS) field.
To the best of our knowledge, we are the first to establish the collaborative vision-text optimizing mechanism within the OVS field.
In open-vocabulary semantic segmentation, our method outperforms the previous state-of-the-art approaches by +0.5, +2.3, +3.4, +0.4 and +1.1 mIoU, respectively.
arXiv Detail & Related papers (2024-08-01T17:48:08Z) - DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models.
We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z) - LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - Enhancing OCR Performance through Post-OCR Models: Adopting Glyph
Embedding for Improved Correction [0.0]
The novelty of our approach lies in embedding the OCR output using CharBERT and our unique embedding technique, capturing the visual characteristics of characters.
Our findings show that post-OCR correction effectively addresses deficiencies in inferior OCR models, and glyph embedding enables the model to achieve superior results.
arXiv Detail & Related papers (2023-08-29T12:41:50Z) - FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback [69.4639239117551]
FigCaps-HF is a new framework for figure-caption generation that incorporates domain expert feedback in generating captions optimized for reader preferences.<n>Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences.
arXiv Detail & Related papers (2023-07-20T13:40:22Z) - RBSR: Efficient and Flexible Recurrent Network for Burst
Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images.
In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z) - Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned
Receipt Images [0.07673339435080445]
We propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end.
Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks.
In our experiments, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate.
arXiv Detail & Related papers (2022-12-11T15:45:26Z) - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text
Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework.
It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency.
The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z) - TNCR: Table Net Detection and Classification Dataset [62.997667081978825]
TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.
We have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.
We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification, and structure recognition.
arXiv Detail & Related papers (2021-06-19T10:48:58Z) - Light Field Reconstruction Using Convolutional Network on EPI and
Extended Applications [78.63280020581662]
A novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views.
We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-24T08:16:32Z) - On-Device Text Image Super Resolution [0.0]
We present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence.
The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling but also runs with an average inference time of 11.7 ms per image.
We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%.
arXiv Detail & Related papers (2020-11-20T07:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.