Related papers: Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents

Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents

URL: http://arxiv.org/abs/2311.15740v1
Date: Mon, 27 Nov 2023 11:44:46 GMT
Title: Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents
Authors: Mariana Dias and Carla Teixeira Lopes
Abstract summary: This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR) The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
Score: 0.8158530638728501
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Linked Data is used in various fields as a new way of structuring and connecting data. Cultural heritage institutions have been using linked data to improve archival descriptions and facilitate the discovery of information. Most archival records have digital representations of physical artifacts in the form of scanned images that are non-machine-readable. Optical Character Recognition (OCR) recognizes text in images and translates it into machine-encoded text. This paper evaluates the impact of image processing methods and parameter tuning in OCR applied to typewritten cultural heritage documents. The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) to tune the methods' parameters. Evaluation results show that parameterization by digital representation typology benefits the performance of image pre-processing algorithms in OCR. Furthermore, our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results. In particular, Adaptive Thresholding, Bilateral Filter, and Opening are the best-performing algorithms for the theatre plays' covers, letters, and overall dataset, respectively, and should be applied before OCR to improve its performance.

Related papers

Digitization of Document and Information Extraction using OCR [0.0]
This document presents a framework for text extraction that merges Optical Character Recognition (OCR) techniques with Large Language Models (LLMs)<n>Scanned files are processed using OCR engines, while digital files are interpreted through layout-aware libraries.<n>The extracted raw text is then analyzed by an LLM to identify key-value pairs and resolve ambiguities.
arXiv Detail & Related papers (2025-06-11T16:03:01Z)
Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation [6.385732495789276]
Document alignment plays a crucial role in numerous real-world applications, such as automated form processing, anomaly detection, and workflow automation.<n>Traditional methods for document alignment rely on image-based features like keypoints, edges, and textures to estimate geometric transformations, such as homographies.<n>This paper introduces a novel approach that leverages Optical Character Recognition (OCR) outputs as features for homography estimation.
arXiv Detail & Related papers (2025-05-25T01:20:32Z)
TFIC: End-to-End Text-Focused Image Compression for Coding for Machines [50.86328069558113]
We present an image compression system designed to retain text-specific features for subsequent Optical Character Recognition (OCR) Our encoding process requires half the time needed by the OCR module, making it especially suitable for devices with limited computational capacity.
arXiv Detail & Related papers (2025-03-25T09:36:13Z)
Geometry Restoration and Dewarping of Camera-Captured Document Images [0.0]
This research focuses on developing a method for restoring the topology of digital images of paper documents captured by a camera. Our methodology employs deep learning (DL) for document outline detection, followed by computer vision (CV) to create a topological 2D grid.
arXiv Detail & Related papers (2025-01-06T17:12:19Z)
UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model. We show that UNIT significantly outperforms existing methods on document-related tasks. Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z)
Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment [0.7499722271664144]
Contrastive Language and Image Pairing (CLIP) is a transformative method in multimedia retrieval. CLIP typically trains two neural networks concurrently to generate joint embeddings for text and image pairs. This paper addresses the challenge of optimizing CLIP models for various image-based similarity search scenarios.
arXiv Detail & Related papers (2024-09-03T14:33:01Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. We introduce a novel method named Decoder Pre-training with only text for STR (DPTR) DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z)
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs. We propose a more realistic setting in which only noisy text and its NER labels are available. We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z)
A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing [2.9499386124223257]
We propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.
arXiv Detail & Related papers (2023-07-09T18:51:17Z)
Text Detection Forgot About Document OCR [0.0]
This paper compares several methods designed for in-the-wild text recognition and for document text recognition. The results suggest that state-of-the-art methods originally proposed for in-the-wild text detection also achieve excellent results on document text detection.
arXiv Detail & Related papers (2022-10-14T15:37:54Z)
Image preprocessing and modified adaptive thresholding for improving OCR [0.0]
In this paper, I have proposed a method to find the major pixel intensity inside the text and thresholding an image accordingly. Based on the results obtained, it can be observed that this algorithm can be efficiently applied in the field of image processing for OCR.
arXiv Detail & Related papers (2021-11-28T08:13:20Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
DocScanner: Robust Document Image Rectification with Progressive Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification. DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z)
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models [47.48019831416665]
We propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR. TrOCR is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.
arXiv Detail & Related papers (2021-09-21T16:01:56Z)
Unknown-box Approximation to Improve Optical Character Recognition Performance [7.805544279853116]
A novel approach is presented for creating a customized preprocessor for a given OCR engine. Experiments with two datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR up to 46% from the baseline.
arXiv Detail & Related papers (2021-05-17T16:09:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.