DocScanner: Robust Document Image Rectification with Progressive
Learning
- URL: http://arxiv.org/abs/2110.14968v1
- Date: Thu, 28 Oct 2021 09:15:02 GMT
- Title: DocScanner: Robust Document Image Rectification with Progressive
Learning
- Authors: Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li
- Abstract summary: This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
- Score: 162.03694280524084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared to flatbed scanners, portable smartphones are much more convenient
for physical documents digitizing. However, such digitized documents are often
distorted due to uncontrolled physical deformations, camera positions, and
illumination variations. To this end, this work presents DocScanner, a new deep
network architecture for document image rectification. Different from existing
methods, DocScanner addresses this issue by introducing a progressive learning
mechanism. Specifically, DocScanner maintains a single estimate of the
rectified image, which is progressively corrected with a recurrent
architecture. The iterative refinements make DocScanner converge to a robust
and superior performance, and the lightweight recurrent architecture ensures
the running efficiency. In addition, before the above rectification process,
observing the corrupted rectified boundaries existing in prior works,
DocScanner exploits a document localization module to explicitly segment the
foreground document from the cluttered background environments. To further
improve the rectification quality, based on the geometric priori between the
distorted and the rectified images, a geometric regularization is introduced
during training to further facilitate the performance. Extensive experiments
are conducted on the Doc3D dataset and the DocUNet benchmark dataset, and the
quantitative and qualitative evaluation results verify the effectiveness of
DocScanner, which outperforms previous methods on OCR accuracy, image
similarity, and our proposed distortion metric by a considerable margin.
Furthermore, our DocScanner shows the highest efficiency in inference time and
parameter count.
Related papers
- DocDiff: Document Enhancement via Residual Diffusion Models [7.972081359533047]
We propose DocDiff, a diffusion-based framework specifically designed for document enhancement problems.
DocDiff consists of two modules: the Coarse Predictor (CP) and the High-Frequency Residual Refinement (HRR) module.
Our proposed HRR module in pre-trained DocDiff is plug-and-play and ready-to-use, with only 4.17M parameters.
arXiv Detail & Related papers (2023-05-06T01:41:10Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Fourier Document Restoration for Robust Document Dewarping and
Recognition [73.44057202891011]
This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions.
It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training.
It outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks.
arXiv Detail & Related papers (2022-03-18T12:39:31Z) - DocTr: Document Image Transformer for Geometric Unwarping and
Illumination Correction [99.09177377916369]
We propose Document Image Transformer (DocTr) to address the issue of geometry and illumination distortion of the document images.
Our DocTr achieves 20.02% Character Error Rate (CER), a 15% absolute improvement over the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-25T13:27:10Z) - Can You Read Me Now? Content Aware Rectification using Angle Supervision [14.095728009592763]
We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification.
Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.
arXiv Detail & Related papers (2020-08-05T16:58:13Z) - Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised
Deep Asymmetric Metric Learning [62.34197797857823]
A central problem in automatic reconstruction of shredded documents is the pairwise compatibility evaluation of the shreds.
This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly.
Our method has accuracy comparable to the state-of-the-art with a speed-up of about 22 times for a test instance with 505 shreds.
arXiv Detail & Related papers (2020-03-23T03:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.