Fourier Document Restoration for Robust Document Dewarping and
Recognition
- URL: http://arxiv.org/abs/2203.09910v1
- Date: Fri, 18 Mar 2022 12:39:31 GMT
- Title: Fourier Document Restoration for Robust Document Dewarping and
Recognition
- Authors: Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai
- Abstract summary: This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions.
It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training.
It outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks.
- Score: 73.44057202891011
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: State-of-the-art document dewarping techniques learn to predict 3-dimensional
information of documents which are prone to errors while dealing with documents
with irregular distortions or large variations in depth. This paper presents
FDRNet, a Fourier Document Restoration Network that can restore documents with
different distortions and improve document recognition in a reliable and
simpler manner. FDRNet focuses on high-frequency components in the Fourier
space that capture most structural information but are largely free of
degradation in appearance. It dewarps documents by a flexible Thin-Plate Spline
transformation which can handle various deformations effectively without
requiring deformation annotations in training. These features allow FDRNet to
learn from a small amount of simply labeled training images, and the learned
model can dewarp documents with complex geometric distortion and recognize the
restored texts accurately. To facilitate document restoration research, we
create a benchmark dataset consisting of over one thousand camera documents
with different types of geometric and photometric distortion. Extensive
experiments show that FDRNet outperforms the state-of-the-art by large margins
on both dewarping and text recognition tasks. In addition, FDRNet requires a
small amount of simply labeled training data and is easy to deploy.
Related papers
- DocDiff: Document Enhancement via Residual Diffusion Models [7.972081359533047]
We propose DocDiff, a diffusion-based framework specifically designed for document enhancement problems.
DocDiff consists of two modules: the Coarse Predictor (CP) and the High-Frequency Residual Refinement (HRR) module.
Our proposed HRR module in pre-trained DocDiff is plug-and-play and ready-to-use, with only 4.17M parameters.
arXiv Detail & Related papers (2023-05-06T01:41:10Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Geometric Rectification of Creased Document Images based on Isometric
Mapping [0.0]
Geometric rectification of images of distorted documents finds wide applications in document digitization and Optical Character Recognition (OCR)
We propose a general framework of document image rectification in which a computational isometric mapping model is utilized for expressing a 3D document model and its flattening in the plane.
Experiments and comparisons to the state-of-the-art approaches demonstrated the effectiveness and outstanding performance of the proposed method.
arXiv Detail & Related papers (2022-12-16T09:33:31Z) - Boosting Modern and Historical Handwritten Text Recognition with
Deformable Convolutions [52.250269529057014]
Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task.
We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
arXiv Detail & Related papers (2022-08-17T06:55:54Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Dewarping Document Image By Displacement Flow Estimation with Fully
Convolutional Network [30.18238229156996]
We propose a framework for both rectifying distorted document image and removing background finely, using a fully convolutional network (FCN)
The FCN is trained by regressing displacements of synthesized distorted documents, and to control the smoothness of displacements, we propose a Local Smooth Constraint (LSC) in regularization.
Experiments proved that our approach can dewarp document images effectively under various geometric distortions, and has achieved the state-of-the-art performance in terms of local details and overall effect.
arXiv Detail & Related papers (2021-04-14T12:32:36Z) - RectiNet-v2: A stacked network architecture for document image dewarping [16.249023269158734]
We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input.
We train this model on warped document images simulated synthetically to compensate for lack of enough natural data.
We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-01T19:26:17Z) - Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised
Deep Asymmetric Metric Learning [62.34197797857823]
A central problem in automatic reconstruction of shredded documents is the pairwise compatibility evaluation of the shreds.
This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly.
Our method has accuracy comparable to the state-of-the-art with a speed-up of about 22 times for a test instance with 505 shreds.
arXiv Detail & Related papers (2020-03-23T03:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.