Deep Unrestricted Document Image Rectification
- URL: http://arxiv.org/abs/2304.08796v2
- Date: Sun, 17 Dec 2023 17:18:33 GMT
- Title: Deep Unrestricted Document Image Rectification
- Authors: Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li
- Abstract summary: We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
- Score: 110.61517455253308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, tremendous efforts have been made on document image
rectification, but existing advanced algorithms are limited to processing
restricted document images, i.e., the input images must incorporate a complete
document. Once the captured image merely involves a local text region, its
rectification quality is degraded and unsatisfactory. Our previously proposed
DocTr, a transformer-assisted network for document image rectification, also
suffers from this limitation. In this work, we present DocTr++, a novel unified
framework for document image rectification, without any restrictions on the
input distorted images. Our major technical improvements can be concluded in
three aspects. Firstly, we upgrade the original architecture by adopting a
hierarchical encoder-decoder structure for multi-scale representation
extraction and parsing. Secondly, we reformulate the pixel-wise mapping
relationship between the unrestricted distorted document images and the
distortion-free counterparts. The obtained data is used to train our DocTr++
for unrestricted document image rectification. Thirdly, we contribute a
real-world test set and metrics applicable for evaluating the rectification
quality. To our best knowledge, this is the first learning-based method for the
rectification of unrestricted document images. Extensive experiments are
conducted, and the results demonstrate the effectiveness and superiority of our
method. We hope our DocTr++ will serve as a strong baseline for generic
document image rectification, prompting the further advancement and application
of learning-based algorithms. The source code and the proposed dataset are
publicly available at https://github.com/fh2019ustc/DocTr-Plus.
Related papers
- A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical
Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer.
Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Unifying Vision, Text, and Layout for Universal Document Processing [105.36490575974028]
We propose a Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation.
Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites.
arXiv Detail & Related papers (2022-12-05T22:14:49Z) - Geometric Representation Learning for Document Image Rectification [137.75133384124976]
We present DocGeoNet for document image rectification by introducing explicit geometric representation.
Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image.
Experiments show the effectiveness of our framework and demonstrate the superiority of our framework over state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T01:57:40Z) - DocEnTr: An End-to-End Document Image Enhancement Transformer [13.108797370734893]
Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties.
We present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images.
arXiv Detail & Related papers (2022-01-25T11:45:35Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - DocTr: Document Image Transformer for Geometric Unwarping and
Illumination Correction [99.09177377916369]
We propose Document Image Transformer (DocTr) to address the issue of geometry and illumination distortion of the document images.
Our DocTr achieves 20.02% Character Error Rate (CER), a 15% absolute improvement over the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-25T13:27:10Z) - Can You Read Me Now? Content Aware Rectification using Angle Supervision [14.095728009592763]
We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification.
Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.
arXiv Detail & Related papers (2020-08-05T16:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.