Detection Masking for Improved OCR on Noisy Documents
- URL: http://arxiv.org/abs/2205.08257v1
- Date: Tue, 17 May 2022 11:59:18 GMT
- Title: Detection Masking for Improved OCR on Noisy Documents
- Authors: Daniel Rotman, Ophir Azulai, Inbar Shapira, Yevgeny Burshtein, Udi
Barzelay
- Abstract summary: We present an improved detection network with a masking system to improve the quality of OCR performed on documents.
We perform a unified evaluation on a publicly available dataset demonstrating the usefulness and broad applicability of our method.
- Score: 8.137198664755596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical Character Recognition (OCR), the task of extracting textual
information from scanned documents is a vital and broadly used technology for
digitizing and indexing physical documents. Existing technologies perform well
for clean documents, but when the document is visually degraded, or when there
are non-textual elements, OCR quality can be greatly impacted, specifically due
to erroneous detections. In this paper we present an improved detection network
with a masking system to improve the quality of OCR performed on documents. By
filtering non-textual elements from the image we can utilize document-level OCR
to incorporate contextual information to improve OCR results. We perform a
unified evaluation on a publicly available dataset demonstrating the usefulness
and broad applicability of our method. Additionally, we present and make
publicly available our synthetic dataset with a unique hard-negative component
specifically tuned to improve detection results, and evaluate the benefits that
can be gained from its usage
Related papers
- Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - Text Detection Forgot About Document OCR [0.0]
This paper compares several methods designed for in-the-wild text recognition and for document text recognition.
The results suggest that state-of-the-art methods originally proposed for in-the-wild text detection also achieve excellent results on document text detection.
arXiv Detail & Related papers (2022-10-14T15:37:54Z) - EraseNet: A Recurrent Residual Network for Supervised Document Cleaning [0.0]
This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture.
The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.
arXiv Detail & Related papers (2022-10-03T04:23:25Z) - BusiNet -- a Light and Fast Text Detection Network for Business
Documents [8.318686824572803]
We present a detection network dubbed BusiNet aimed at OCR of business documents.
BusiNet was designed to be fast and light so it could run locally preventing privacy issues.
The model is made robust to unseen noise by employing adversarial training strategies.
arXiv Detail & Related papers (2022-07-04T06:08:49Z) - Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide.
The class or typology'' of a document is perhaps the most important tag to be included in the metadata.
The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z) - Fourier Document Restoration for Robust Document Dewarping and
Recognition [73.44057202891011]
This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions.
It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training.
It outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks.
arXiv Detail & Related papers (2022-03-18T12:39:31Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Enhance to Read Better: An Improved Generative Adversarial Network for
Handwritten Document Image Enhancement [1.7491858164568674]
We propose an end to end architecture based on Generative Adversarial Networks (GANs) to recover degraded documents into a clean and readable form.
To the best of our knowledge, this is the first work to use the text information while binarizing handwritten documents.
We outperform the state of the art in H-DIBCO 2018 challenge, after fine tuning our pre-trained model with synthetically degraded Latin handwritten images.
arXiv Detail & Related papers (2021-05-26T17:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.