An end-to-end Optical Character Recognition approach for
ultra-low-resolution printed text images
- URL: http://arxiv.org/abs/2105.04515v1
- Date: Mon, 10 May 2021 17:08:06 GMT
- Title: An end-to-end Optical Character Recognition approach for
ultra-low-resolution printed text images
- Authors: Julian D. Gilbey, Carola-Bibiane Sch\"onlieb
- Abstract summary: We present a novel method for performing optical character recognition (OCR) on low-resolution images.
This approach is inspired from our understanding of the human visual system, and builds on established neural networks for performing OCR.
We achieve a mean character level accuracy (CLA) of 99.7% and word level accuracy (WLA) of 98.9% across a set of about 1000 pages of 60 dpi text.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Some historical and more recent printed documents have been scanned or stored
at very low resolutions, such as 60 dpi. Though such scans are relatively easy
for humans to read, they still present significant challenges for optical
character recognition (OCR) systems. The current state-of-the art is to use
super-resolution to reconstruct an approximation of the original
high-resolution image and to feed this into a standard OCR system. Our novel
end-to-end method bypasses the super-resolution step and produces better OCR
results. This approach is inspired from our understanding of the human visual
system, and builds on established neural networks for performing OCR.
Our experiments have shown that it is possible to perform OCR on 60 dpi
scanned images of English text, which is a significantly lower resolution than
the state-of-the-art, and we achieved a mean character level accuracy (CLA) of
99.7% and word level accuracy (WLA) of 98.9% across a set of about 1000 pages
of 60 dpi text in a wide range of fonts. For 75 dpi images, the mean CLA was
99.9% and the mean WLA was 99.4% on the same sample of texts. We make our code
and data (including a set of low-resolution images with their ground truths)
publicly available as a benchmark for future work in this field.
Related papers
- Mero Nagarikta: Advanced Nepali Citizenship Data Extractor with Deep Learning-Powered Text Detection and OCR [0.0]
This work proposes a robust system using YOLOv8 for accurate text object detection and an OCR algorithm based on Optimized PyTesseract.
The system, implemented within the context of a mobile application, allows for the automated extraction of important textual information.
The tested PyTesseract optimized for Nepali characters outperformed the standard OCR regarding flexibility and accuracy.
arXiv Detail & Related papers (2024-10-08T06:29:08Z) - LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned
Receipt Images [0.07673339435080445]
We propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end.
Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks.
In our experiments, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate.
arXiv Detail & Related papers (2022-12-11T15:45:26Z) - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text
Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework.
It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency.
The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - RayNet: Real-time Scene Arbitrary-shape Text Detection with Multiple
Rays [84.15123599963239]
We propose a novel detection framework for arbitrary-shape text detection, termed as RayNet.
RayNet uses Center Point Set (CPS) and Ray Distance (RD) to fit text, where CPS is used to determine the text general position and the RD is combined with CPS to compute Ray Points (RP) to localize the text accurate shape.
RayNet achieves impressive performance on existing curved text dataset (CTW1500) and quadrangle text dataset (ICDAR2015)
arXiv Detail & Related papers (2021-04-11T03:03:23Z) - UltraSR: Spatial Encoding is a Missing Key for Implicit Image
Function-based Arbitrary-Scale Super-Resolution [74.82282301089994]
In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions.
We show that spatial encoding is indeed a missing key towards the next-stage high-accuracy implicit image function.
Our UltraSR sets new state-of-the-art performance on the DIV2K benchmark under all super-resolution scales.
arXiv Detail & Related papers (2021-03-23T17:36:42Z) - PP-OCR: A Practical Ultra Lightweight OCR System [8.740684949994664]
We propose a practical ultra lightweight OCR system, i.e., PP-OCR.
The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols.
arXiv Detail & Related papers (2020-09-21T14:57:18Z) - An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z) - Scene Text Image Super-Resolution in the Wild [112.90416737357141]
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones.
Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images.
We pro-pose a real scene text SR dataset, termed TextZoom.
It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
arXiv Detail & Related papers (2020-05-07T09:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.