DocMAE: Document Image Rectification via Self-supervised Representation
Learning
- URL: http://arxiv.org/abs/2304.10341v1
- Date: Thu, 20 Apr 2023 14:27:15 GMT
- Title: DocMAE: Document Image Rectification via Self-supervised Representation
Learning
- Authors: Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu
- Abstract summary: We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
- Score: 144.44748607192147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tremendous efforts have been made on document image rectification, but how to
learn effective representation of such distorted images is still
under-explored. In this paper, we present DocMAE, a novel self-supervised
framework for document image rectification. Our motivation is to encode the
structural cues in document images by leveraging masked autoencoder to benefit
the rectification, i.e., the document boundaries, and text lines. Specifically,
we first mask random patches of the background-excluded document images and
then reconstruct the missing pixels. With such a self-supervised learning
approach, the network is encouraged to learn the intrinsic structure of
deformed documents by restoring document boundaries and missing text lines.
Transfer performance in the downstream rectification task validates the
effectiveness of our method. Extensive experiments are conducted to demonstrate
the effectiveness of our method.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Image Generation and Learning Strategy for Deep Document Forgery
Detection [7.585489507445007]
Recent advancements in deep neural network (DNN) methods for generative tasks may amplify the threat of document forgery.
We construct a training dataset of document forgery images, named FD-VIED, by emulating possible attacks.
In our experiments, we demonstrate that our approach enhances detection performance.
arXiv Detail & Related papers (2023-11-07T01:40:00Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training [64.37272287179661]
StrucTexTv2 is an effective document image pre-training framework.
It consists of two self-supervised pre-training tasks: masked image modeling and masked language modeling.
It achieves competitive or even new state-of-the-art performance in various downstream tasks such as image classification, layout analysis, table structure recognition, document OCR, and information extraction.
arXiv Detail & Related papers (2023-03-01T07:32:51Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Dewarping Document Image By Displacement Flow Estimation with Fully
Convolutional Network [30.18238229156996]
We propose a framework for both rectifying distorted document image and removing background finely, using a fully convolutional network (FCN)
The FCN is trained by regressing displacements of synthesized distorted documents, and to control the smoothness of displacements, we propose a Local Smooth Constraint (LSC) in regularization.
Experiments proved that our approach can dewarp document images effectively under various geometric distortions, and has achieved the state-of-the-art performance in terms of local details and overall effect.
arXiv Detail & Related papers (2021-04-14T12:32:36Z) - RectiNet-v2: A stacked network architecture for document image dewarping [16.249023269158734]
We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input.
We train this model on warped document images simulated synthetically to compensate for lack of enough natural data.
We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-01T19:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.