Unknown-box Approximation to Improve Optical Character Recognition
Performance
- URL: http://arxiv.org/abs/2105.07983v1
- Date: Mon, 17 May 2021 16:09:15 GMT
- Title: Unknown-box Approximation to Improve Optical Character Recognition
Performance
- Authors: Ayantha Randika, Nilanjan Ray, Xiao Xiao, Allegra Latimer
- Abstract summary: A novel approach is presented for creating a customized preprocessor for a given OCR engine.
Experiments with two datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR up to 46% from the baseline.
- Score: 7.805544279853116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical character recognition (OCR) is a widely used pattern recognition
application in numerous domains. There are several feature-rich,
general-purpose OCR solutions available for consumers, which can provide
moderate to excellent accuracy levels. However, accuracy can diminish with
difficult and uncommon document domains. Preprocessing of document images can
be used to minimize the effect of domain shift. In this paper, a novel approach
is presented for creating a customized preprocessor for a given OCR engine.
Unlike the previous OCR agnostic preprocessing techniques, the proposed
approach approximates the gradient of a particular OCR engine to train a
preprocessor module. Experiments with two datasets and two OCR engines show
that the presented preprocessor is able to improve the accuracy of the OCR up
to 46% from the baseline by applying pixel-level manipulations to the document
image. The implementation of the proposed method and the enhanced public
datasets are available for download.
Related papers
- Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation [0.0]
The study uses a dataset of Brazilian vehicle license plates, widely used in OCR applications.
The research provides a detailed analysis of best preprocessing practices, offering insights to optimize OCR performance in real-world scenarios.
arXiv Detail & Related papers (2024-10-15T21:00:27Z) - UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer [12.966765239586994]
Multi- fonts, mixed scenes and complex layouts seriously affect the recognition accuracy of traditional OCR models.
We propose a parameter-efficient mixed text recognition method based on pre-trained OCR Transformer, namely DLoRA-TrOCR.
arXiv Detail & Related papers (2024-04-19T09:28:16Z) - LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.
Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z) - Image preprocessing and modified adaptive thresholding for improving OCR [0.0]
In this paper, I have proposed a method to find the major pixel intensity inside the text and thresholding an image accordingly.
Based on the results obtained, it can be observed that this algorithm can be efficiently applied in the field of image processing for OCR.
arXiv Detail & Related papers (2021-11-28T08:13:20Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.