Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents
- URL: http://arxiv.org/abs/2311.15740v1
- Date: Mon, 27 Nov 2023 11:44:46 GMT
- Title: Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents
- Authors: Mariana Dias and Carla Teixeira Lopes
- Abstract summary: This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
- Score: 0.8158530638728501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linked Data is used in various fields as a new way of structuring and
connecting data. Cultural heritage institutions have been using linked data to
improve archival descriptions and facilitate the discovery of information. Most
archival records have digital representations of physical artifacts in the form
of scanned images that are non-machine-readable. Optical Character Recognition
(OCR) recognizes text in images and translates it into machine-encoded text.
This paper evaluates the impact of image processing methods and parameter
tuning in OCR applied to typewritten cultural heritage documents. The approach
uses a multi-objective problem formulation to minimize Levenshtein edit
distance and maximize the number of words correctly identified with a
non-dominated sorting genetic algorithm (NSGA-II) to tune the methods'
parameters. Evaluation results show that parameterization by digital
representation typology benefits the performance of image pre-processing
algorithms in OCR. Furthermore, our findings suggest that employing image
pre-processing algorithms in OCR might be more suitable for typologies where
the text recognition task without pre-processing does not produce good results.
In particular, Adaptive Thresholding, Bilateral Filter, and Opening are the
best-performing algorithms for the theatre plays' covers, letters, and overall
dataset, respectively, and should be applied before OCR to improve its
performance.
Related papers
- Sentence-level Prompts Benefit Composed Image Retrieval [69.78119883060006]
Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption.
We propose to leverage pretrained V-L models, e.g., BLIP-2, to generate sentence-level prompts.
Our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets.
arXiv Detail & Related papers (2023-10-09T07:31:44Z) - A Novel Pipeline for Improving Optical Character Recognition through
Post-processing Using Natural Language Processing [2.9499386124223257]
We propose a post-processing approach using Natural Language Processing (NLP) tools.
This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.
arXiv Detail & Related papers (2023-07-09T18:51:17Z) - StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training [64.37272287179661]
StrucTexTv2 is an effective document image pre-training framework.
It consists of two self-supervised pre-training tasks: masked image modeling and masked language modeling.
It achieves competitive or even new state-of-the-art performance in various downstream tasks such as image classification, layout analysis, table structure recognition, document OCR, and information extraction.
arXiv Detail & Related papers (2023-03-01T07:32:51Z) - Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned
Receipt Images [0.07673339435080445]
We propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end.
Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks.
In our experiments, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate.
arXiv Detail & Related papers (2022-12-11T15:45:26Z) - Text Detection Forgot About Document OCR [0.0]
This paper compares several methods designed for in-the-wild text recognition and for document text recognition.
The results suggest that state-of-the-art methods originally proposed for in-the-wild text detection also achieve excellent results on document text detection.
arXiv Detail & Related papers (2022-10-14T15:37:54Z) - MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining [68.05105411320842]
We propose a novel approach MaskOCR to unify vision and language pre-training in the classical encoder-decoder recognition framework.
We adopt the masked image modeling approach to pre-train the feature encoder using a large set of unlabeled real text images.
We transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder.
arXiv Detail & Related papers (2022-06-01T08:27:19Z) - Image preprocessing and modified adaptive thresholding for improving OCR [0.0]
In this paper, I have proposed a method to find the major pixel intensity inside the text and thresholding an image accordingly.
Based on the results obtained, it can be observed that this algorithm can be efficiently applied in the field of image processing for OCR.
arXiv Detail & Related papers (2021-11-28T08:13:20Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - TrOCR: Transformer-based Optical Character Recognition with Pre-trained
Models [47.48019831416665]
We propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR.
TrOCR is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets.
Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.
arXiv Detail & Related papers (2021-09-21T16:01:56Z) - Unknown-box Approximation to Improve Optical Character Recognition
Performance [7.805544279853116]
A novel approach is presented for creating a customized preprocessor for a given OCR engine.
Experiments with two datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR up to 46% from the baseline.
arXiv Detail & Related papers (2021-05-17T16:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.