Related papers: 3D Rendering Framework for Data Augmentation in Optical Character Recognition

3D Rendering Framework for Data Augmentation in Optical Character Recognition

URL: http://arxiv.org/abs/2209.14970v1
Date: Tue, 27 Sep 2022 19:31:23 GMT
Title: 3D Rendering Framework for Data Augmentation in Optical Character Recognition
Authors: Andreas Spruck, Maximiliane Hawesch, Anatol Maier, Christian Riess, J\"urgen Seiler, Andr\'e Kaup
Abstract summary: We propose a data augmentation framework for Optical Character Recognition (OCR) The proposed framework is able to synthesize new viewing angles and illumination scenarios. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset.
Score: 8.641647607173864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the available dataset. Furthermore, the proposed method is not restricted to single frame OCR but can also be applied to video OCR. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset. Our proposed framework is capable of leveraging the performance of OCR applications especially for small datasets. Applying the proposed method, improvements of up to 2.79 percentage points in terms of Character Error Rate (CER), and up to 7.88 percentage points in terms of Word Error Rate (WER) are achieved on the subset. Especially the recognition of challenging text lines can be improved. The CER may be decreased by up to 14.92 percentage points and the WER by up to 18.19 percentage points for this class. Moreover, we are able to achieve smaller error rates when training on the 15% subset augmented with the proposed method than on the original non-augmented full dataset.

Related papers

DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation [0.0]
This paper introduces a novel end-to-end framework that combines ResNet and Vision Transformer backbones with advanced methodologies, including Deformable Convolutions, Retrieval-Augmented Generation, and Conditional Random Fields (CRF)<n>Experiments conducted on six benchmark datasets establish a new state-of-the-art for text recognition, demonstrating the robustness of the approach across diverse and challenging datasets.
arXiv Detail & Related papers (2025-05-07T07:06:04Z)
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models [0.0]
This paper introduces Context Leveraging OCR Correction (CLOCR-C) It uses the infilling and context-adaptive abilities of transformer-based language models (LMs) to improve OCR quality. The study aims to determine if LMs can perform post-OCR correction, improve downstream NLP tasks, and the value of providing socio-cultural context as part of the correction process.
arXiv Detail & Related papers (2024-08-30T17:26:05Z)
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation [82.95830628372845]
This paper introduces a collaborative vision-text optimizing mechanism within the Open-Vocabulary encoder (OVS) field. To the best of our knowledge, we are the first to establish the collaborative vision-text optimizing mechanism within the OVS field. In open-vocabulary semantic segmentation, our method outperforms the previous state-of-the-art approaches by +0.5, +2.3, +3.4, +0.4 and +1.1 mIoU, respectively.
arXiv Detail & Related papers (2024-08-01T17:48:08Z)
DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models. We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z)
LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression. We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols. It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z)
Enhancing OCR Performance through Post-OCR Models: Adopting Glyph Embedding for Improved Correction [0.0]
The novelty of our approach lies in embedding the OCR output using CharBERT and our unique embedding technique, capturing the visual characteristics of characters. Our findings show that post-OCR correction effectively addresses deficiencies in inferior OCR models, and glyph embedding enables the model to achieve superior results.
arXiv Detail & Related papers (2023-08-29T12:41:50Z)
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback [69.4639239117551]
FigCaps-HF is a new framework for figure-caption generation that incorporates domain expert feedback in generating captions optimized for reader preferences.<n>Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences.
arXiv Detail & Related papers (2023-07-20T13:40:22Z)
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images. In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z)
Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images [0.07673339435080445]
We propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks. In our experiments, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate.
arXiv Detail & Related papers (2022-12-11T15:45:26Z)
Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework. It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency. The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z)
TNCR: Table Net Detection and Classification Dataset [62.997667081978825]
TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes. We have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines. We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification, and structure recognition.
arXiv Detail & Related papers (2021-06-19T10:48:58Z)
Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications [78.63280020581662]
A novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views. We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-24T08:16:32Z)
On-Device Text Image Super Resolution [0.0]
We present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence. The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling but also runs with an average inference time of 11.7 ms per image. We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%.
arXiv Detail & Related papers (2020-11-20T07:49:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.