Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents
- URL: http://arxiv.org/abs/2311.15740v1
- Date: Mon, 27 Nov 2023 11:44:46 GMT
- Title: Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents
- Authors: Mariana Dias and Carla Teixeira Lopes
- Abstract summary: This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
- Score: 0.8158530638728501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linked Data is used in various fields as a new way of structuring and
connecting data. Cultural heritage institutions have been using linked data to
improve archival descriptions and facilitate the discovery of information. Most
archival records have digital representations of physical artifacts in the form
of scanned images that are non-machine-readable. Optical Character Recognition
(OCR) recognizes text in images and translates it into machine-encoded text.
This paper evaluates the impact of image processing methods and parameter
tuning in OCR applied to typewritten cultural heritage documents. The approach
uses a multi-objective problem formulation to minimize Levenshtein edit
distance and maximize the number of words correctly identified with a
non-dominated sorting genetic algorithm (NSGA-II) to tune the methods'
parameters. Evaluation results show that parameterization by digital
representation typology benefits the performance of image pre-processing
algorithms in OCR. Furthermore, our findings suggest that employing image
pre-processing algorithms in OCR might be more suitable for typologies where
the text recognition task without pre-processing does not produce good results.
In particular, Adaptive Thresholding, Bilateral Filter, and Opening are the
best-performing algorithms for the theatre plays' covers, letters, and overall
dataset, respectively, and should be applied before OCR to improve its
performance.
Related papers
- UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment [0.7499722271664144]
Contrastive Language and Image Pairing (CLIP) is a transformative method in multimedia retrieval.
CLIP typically trains two neural networks concurrently to generate joint embeddings for text and image pairs.
This paper addresses the challenge of optimizing CLIP models for various image-based similarity search scenarios.
arXiv Detail & Related papers (2024-09-03T14:33:01Z) - Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets.
We introduce a novel method named Decoder Pre-training with only text for STR (DPTR)
DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z) - Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - A Novel Pipeline for Improving Optical Character Recognition through
Post-processing Using Natural Language Processing [2.9499386124223257]
We propose a post-processing approach using Natural Language Processing (NLP) tools.
This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.
arXiv Detail & Related papers (2023-07-09T18:51:17Z) - Text Detection Forgot About Document OCR [0.0]
This paper compares several methods designed for in-the-wild text recognition and for document text recognition.
The results suggest that state-of-the-art methods originally proposed for in-the-wild text detection also achieve excellent results on document text detection.
arXiv Detail & Related papers (2022-10-14T15:37:54Z) - Image preprocessing and modified adaptive thresholding for improving OCR [0.0]
In this paper, I have proposed a method to find the major pixel intensity inside the text and thresholding an image accordingly.
Based on the results obtained, it can be observed that this algorithm can be efficiently applied in the field of image processing for OCR.
arXiv Detail & Related papers (2021-11-28T08:13:20Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - TrOCR: Transformer-based Optical Character Recognition with Pre-trained
Models [47.48019831416665]
We propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR.
TrOCR is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets.
Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.
arXiv Detail & Related papers (2021-09-21T16:01:56Z) - Unknown-box Approximation to Improve Optical Character Recognition
Performance [7.805544279853116]
A novel approach is presented for creating a customized preprocessor for a given OCR engine.
Experiments with two datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR up to 46% from the baseline.
arXiv Detail & Related papers (2021-05-17T16:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.