BusiNet -- a Light and Fast Text Detection Network for Business
Documents
- URL: http://arxiv.org/abs/2207.01220v1
- Date: Mon, 4 Jul 2022 06:08:49 GMT
- Title: BusiNet -- a Light and Fast Text Detection Network for Business
Documents
- Authors: Oshri Naparstek, Ophir Azulai, Daniel Rotman, Yevgeny Burshtein, Peter
Staar, Udi Barzelay
- Abstract summary: We present a detection network dubbed BusiNet aimed at OCR of business documents.
BusiNet was designed to be fast and light so it could run locally preventing privacy issues.
The model is made robust to unseen noise by employing adversarial training strategies.
- Score: 8.318686824572803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For digitizing or indexing physical documents, Optical Character Recognition
(OCR), the process of extracting textual information from scanned documents, is
a vital technology. When a document is visually damaged or contains non-textual
elements, existing technologies can yield poor results, as erroneous detection
results can greatly affect the quality of OCR. In this paper we present a
detection network dubbed BusiNet aimed at OCR of business documents. Business
documents often include sensitive information and as such they cannot be
uploaded to a cloud service for OCR. BusiNet was designed to be fast and light
so it could run locally preventing privacy issues. Furthermore, BusiNet is
built to handle scanned document corruption and noise using a specialized
synthetic dataset. The model is made robust to unseen noise by employing
adversarial training strategies. We perform an evaluation on publicly available
datasets demonstrating the usefulness and broad applicability of our model.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - CTP-Net: Character Texture Perception Network for Document Image Forgery
Localization [28.48117743313255]
We propose a Character Texture Perception Network (CTP-Net) to localize the forged regions in document images.
Considering the characters with semantics in a document image are highly vulnerable, capturing the forgery traces is the key to localize the forged regions.
The proposed-Net is able to localize multi-scale forged areas in document images, and outperform the state-of-the-art forgery localization methods.
arXiv Detail & Related papers (2023-08-04T06:37:28Z) - User-Centric Evaluation of OCR Systems for Kwak'wala [92.73847703011353]
We show that utilizing OCR reduces the time spent in the manual transcription of culturally valuable documents by over 50%.
Our results demonstrate the potential benefits that OCR tools can have on downstream language documentation and revitalization efforts.
arXiv Detail & Related papers (2023-02-26T21:41:15Z) - EraseNet: A Recurrent Residual Network for Supervised Document Cleaning [0.0]
This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture.
The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.
arXiv Detail & Related papers (2022-10-03T04:23:25Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide.
The class or typology'' of a document is perhaps the most important tag to be included in the metadata.
The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z) - Detection Masking for Improved OCR on Noisy Documents [8.137198664755596]
We present an improved detection network with a masking system to improve the quality of OCR performed on documents.
We perform a unified evaluation on a publicly available dataset demonstrating the usefulness and broad applicability of our method.
arXiv Detail & Related papers (2022-05-17T11:59:18Z) - Fourier Document Restoration for Robust Document Dewarping and
Recognition [73.44057202891011]
This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions.
It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training.
It outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks.
arXiv Detail & Related papers (2022-03-18T12:39:31Z) - Donut: Document Understanding Transformer without OCR [17.397447819420695]
We propose a novel VDU model that is end-to-end trainable without underpinning OCR framework.
Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets.
arXiv Detail & Related papers (2021-11-30T18:55:19Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR
documents [2.6201102730518606]
We demonstrate an effective framework for mitigating OCR errors for any downstream NLP task.
We first address the data scarcity problem for model training by constructing a document synthesis pipeline.
For the benefit of the community, we have made the document synthesis pipeline available as an open-source project.
arXiv Detail & Related papers (2021-08-06T00:32:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.