DocTr: Document Image Transformer for Geometric Unwarping and
Illumination Correction
- URL: http://arxiv.org/abs/2110.12942v1
- Date: Mon, 25 Oct 2021 13:27:10 GMT
- Title: DocTr: Document Image Transformer for Geometric Unwarping and
Illumination Correction
- Authors: Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li
- Abstract summary: We propose Document Image Transformer (DocTr) to address the issue of geometry and illumination distortion of the document images.
Our DocTr achieves 20.02% Character Error Rate (CER), a 15% absolute improvement over the state-of-the-art methods.
- Score: 99.09177377916369
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a new framework, called Document Image Transformer
(DocTr), to address the issue of geometry and illumination distortion of the
document images. Specifically, DocTr consists of a geometric unwarping
transformer and an illumination correction transformer. By setting a set of
learned query embedding, the geometric unwarping transformer captures the
global context of the document image by self-attention mechanism and decodes
the pixel-wise displacement solution to correct the geometric distortion. After
geometric unwarping, our illumination correction transformer further removes
the shading artifacts to improve the visual quality and OCR accuracy. Extensive
evaluations are conducted on several datasets, and superior results are
reported against the state-of-the-art methods. Remarkably, our DocTr achieves
20.02% Character Error Rate (CER), a 15% absolute improvement over the
state-of-the-art methods. Moreover, it also shows high efficiency on running
time and parameter count. The results will be available at
https://github.com/fh2019ustc/DocTr for further comparison.
Related papers
- Transformer based Pluralistic Image Completion with Reduced Information Loss [72.92754600354199]
Transformer based methods have achieved great success in image inpainting recently.
They regard each pixel as a token, thus suffering from an information loss issue.
We propose a new transformer based framework called "PUT"
arXiv Detail & Related papers (2024-03-31T01:20:16Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Geometric Representation Learning for Document Image Rectification [137.75133384124976]
We present DocGeoNet for document image rectification by introducing explicit geometric representation.
Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image.
Experiments show the effectiveness of our framework and demonstrate the superiority of our framework over state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T01:57:40Z) - Document Dewarping with Control Points [36.32190493389662]
We propose a simple yet effective approach to rectify distorted document image by estimating control points and reference points.
Control points are controllable to facilitate interaction or subsequent adjustment.
Experiments show that our approach can rectify document images with various distortion types, and yield state-of-the-art performance on real-world dataset.
arXiv Detail & Related papers (2022-03-20T12:51:14Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Dewarping Document Image By Displacement Flow Estimation with Fully
Convolutional Network [30.18238229156996]
We propose a framework for both rectifying distorted document image and removing background finely, using a fully convolutional network (FCN)
The FCN is trained by regressing displacements of synthesized distorted documents, and to control the smoothness of displacements, we propose a Local Smooth Constraint (LSC) in regularization.
Experiments proved that our approach can dewarp document images effectively under various geometric distortions, and has achieved the state-of-the-art performance in terms of local details and overall effect.
arXiv Detail & Related papers (2021-04-14T12:32:36Z) - Can You Read Me Now? Content Aware Rectification using Angle Supervision [14.095728009592763]
We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification.
Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.
arXiv Detail & Related papers (2020-08-05T16:58:13Z) - Multistage Curvilinear Coordinate Transform Based Document Image
Dewarping using a Novel Quality Estimator [11.342730352935913]
The present work demonstrates a fast and improved technique for dewarping nonlinearly warped document images.
The images are first dewarped at the page-level by estimating optimum inverse projections using curvilinear homography.
The quality of the process is then estimated by evaluating a set of metrics related to the characteristics of the text lines and rectilinear objects.
If the quality is estimated to be unsatisfactory, the page-level dewarping process is repeated with finer approximations.
This is followed by a line-level dewarping process that makes granular corrections to the warps in individual text-lines.
arXiv Detail & Related papers (2020-03-15T17:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.