RectiNet-v2: A stacked network architecture for document image dewarping
- URL: http://arxiv.org/abs/2102.01120v1
- Date: Mon, 1 Feb 2021 19:26:17 GMT
- Title: RectiNet-v2: A stacked network architecture for document image dewarping
- Authors: Hmrishav Bandyopadhyay, Tanmoy Dasgupta, Nibaran Das, Mita Nasipuri
- Abstract summary: We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input.
We train this model on warped document images simulated synthetically to compensate for lack of enough natural data.
We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
- Score: 16.249023269158734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advent of mobile and hand-held cameras, document images have found
their way into almost every domain. Dewarping of these images for the removal
of perspective distortions and folds is essential so that they can be
understood by document recognition algorithms. For this, we propose an
end-to-end CNN architecture that can produce distortion free document images
from warped documents it takes as input. We train this model on warped document
images simulated synthetically to compensate for lack of enough natural data.
Our method is novel in the use of a bifurcated decoder with shared weights to
prevent intermingling of grid coordinates, in the use of residual networks in
the U-Net skip connections to allow flow of data from different receptive
fields in the model, and in the use of a gated network to help the model focus
on structure and line level detail of the document image. We evaluate our
method on the DocUNet dataset, a benchmark in this domain, and obtain results
comparable to state-of-the-art methods.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical
Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer.
Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - UVDoc: Neural Grid-based Document Unwarping [20.51368640747448]
Restoring the original, flat appearance of a printed document from casual photographs is a common everyday problem.
We propose a novel method for grid-based single-image document unwarping.
Our method performs geometric distortion correction via a fully convolutional deep neural network.
arXiv Detail & Related papers (2023-02-06T15:53:34Z) - Geometric Representation Learning for Document Image Rectification [137.75133384124976]
We present DocGeoNet for document image rectification by introducing explicit geometric representation.
Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image.
Experiments show the effectiveness of our framework and demonstrate the superiority of our framework over state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T01:57:40Z) - Fourier Document Restoration for Robust Document Dewarping and
Recognition [73.44057202891011]
This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions.
It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training.
It outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks.
arXiv Detail & Related papers (2022-03-18T12:39:31Z) - Multiple Document Datasets Pre-training Improves Text Line Detection
With Deep Neural Networks [2.5352713493505785]
We introduce a fully convolutional network for the document layout analysis task.
Our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents.
We show that Doc-UFCN outperforms state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2020-12-28T09:48:33Z) - Two-stage generative adversarial networks for document image
binarization with color noise and background removal [7.639067237772286]
We propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks.
In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image.
In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size.
arXiv Detail & Related papers (2020-10-20T07:51:50Z) - A Gated and Bifurcated Stacked U-Net Module for Document Image Dewarping [20.591737450565855]
We propose a supervised Gated and Bifurcated Stacked U-Net module to predict a dewarping grid and create a distortion free image from the input.
The novelty in our methods exists not only in a bifurcation of the U-Net to help eliminate the intermingling of the grid coordinates, but also in the use of a gated network which adds boundary and other minute line level details to the model.
arXiv Detail & Related papers (2020-07-20T01:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.