Enhancing OCR Performance through Post-OCR Models: Adopting Glyph
Embedding for Improved Correction
- URL: http://arxiv.org/abs/2308.15262v1
- Date: Tue, 29 Aug 2023 12:41:50 GMT
- Title: Enhancing OCR Performance through Post-OCR Models: Adopting Glyph
Embedding for Improved Correction
- Authors: Yung-Hsin Chen and Yuli Zhou
- Abstract summary: The novelty of our approach lies in embedding the OCR output using CharBERT and our unique embedding technique, capturing the visual characteristics of characters.
Our findings show that post-OCR correction effectively addresses deficiencies in inferior OCR models, and glyph embedding enables the model to achieve superior results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The study investigates the potential of post-OCR models to overcome
limitations in OCR models and explores the impact of incorporating glyph
embedding on post-OCR correction performance. In this study, we have developed
our own post-OCR correction model. The novelty of our approach lies in
embedding the OCR output using CharBERT and our unique embedding technique,
capturing the visual characteristics of characters. Our findings show that
post-OCR correction effectively addresses deficiencies in inferior OCR models,
and glyph embedding enables the model to achieve superior results, including
the ability to correct individual words.
Related papers
- Confidence-Aware Document OCR Error Detection [1.003485566379789]
We analyze the correlation between confidence scores and error rates across different OCR systems.
We develop ConfBERT, a BERT-based model that incorporates OCR confidence scores into token embeddings.
arXiv Detail & Related papers (2024-09-06T08:35:28Z) - CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models [0.0]
This paper introduces Context Leveraging OCR Correction (CLOCR-C)
It uses the infilling and context-adaptive abilities of transformer-based language models (LMs) to improve OCR quality.
The study aims to determine if LMs can perform post-OCR correction, improve downstream NLP tasks, and the value of providing socio-cultural context as part of the correction process.
arXiv Detail & Related papers (2024-08-30T17:26:05Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Data Generation for Post-OCR correction of Cyrillic handwriting [41.94295877935867]
This paper focuses on the development and application of a synthetic handwriting generation engine based on B'ezier curves.
Such an engine generates highly realistic handwritten text in any amounts, which we utilize to create a substantial dataset.
We apply a Handwritten Text Recognition (HTR) model to this dataset to identify OCR errors, forming the basis for our POC model training.
arXiv Detail & Related papers (2023-11-27T15:01:26Z) - Cross-modal Active Complementary Learning with Self-refining
Correspondence [54.61307946222386]
We propose a Cross-modal Robust Complementary Learning framework (CRCL) to improve the robustness of existing methods.
ACL exploits active and complementary learning losses to reduce the risk of providing erroneous supervision.
SCC utilizes multiple self-refining processes with momentum correction to enlarge the receptive field for correcting correspondences.
arXiv Detail & Related papers (2023-10-26T15:15:11Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - iOCR: Informed Optical Character Recognition for Election Ballot Tallies [13.343515845758398]
iOCR was developed with a spell correction algorithm to fix errors introduced by conventional OCR for vote tabulation.
The results found that the iOCR system outperforms conventional OCR techniques.
arXiv Detail & Related papers (2022-08-01T13:50:13Z) - An Evaluation of OCR on Egocentric Data [30.637021477342035]
In this paper, we evaluate state-of-the-art OCR methods on Egocentric data.
We demonstrate that existing OCR methods struggle with rotated text, which is frequently observed on objects being handled.
We introduce a simple rotate-and-merge procedure which can be applied to pre-trained OCR models that halves the normalized edit distance error.
arXiv Detail & Related papers (2022-06-11T10:37:20Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - A Self-Refinement Strategy for Noise Reduction in Grammatical Error
Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets.
There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected.
We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z) - Characteristic Regularisation for Super-Resolving Face Images [81.84939112201377]
Existing facial image super-resolution (SR) methods focus mostly on improving artificially down-sampled low-resolution (LR) imagery.
Previous unsupervised domain adaptation (UDA) methods address this issue by training a model using unpaired genuine LR and HR data.
This renders the model overstretched with two tasks: consistifying the visual characteristics and enhancing the image resolution.
We formulate a method that joins the advantages of conventional SR and UDA models.
arXiv Detail & Related papers (2019-12-30T16:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.