UDBNET: Unsupervised Document Binarization Network via Adversarial Game
- URL: http://arxiv.org/abs/2007.07075v2
- Date: Tue, 27 Oct 2020 09:58:28 GMT
- Title: UDBNET: Unsupervised Document Binarization Network via Adversarial Game
- Authors: Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim
Roy, Umapada Pal
- Abstract summary: We present a novel approach towards document image binarization by introducing three-player min-max adversarial game.
In our approach, an Adversarial Texture Augmentation Network (ATANet) first superimposes the texture of a degraded reference image over a clean image.
The clean image along with its generated degraded version constitute the pseudo paired-data which is used to train the Unsupervised Document Binarization Network (UDBNet)
- Score: 26.60652038277151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Degraded document image binarization is one of the most challenging tasks in
the domain of document image analysis. In this paper, we present a novel
approach towards document image binarization by introducing three-player
min-max adversarial game. We train the network in an unsupervised setup by
assuming that we do not have any paired-training data. In our approach, an
Adversarial Texture Augmentation Network (ATANet) first superimposes the
texture of a degraded reference image over a clean image. Later, the clean
image along with its generated degraded version constitute the pseudo
paired-data which is used to train the Unsupervised Document Binarization
Network (UDBNet). Following this approach, we have enlarged the document
binarization datasets as it generates multiple images having same content
feature but different textual feature. These generated noisy images are then
fed into the UDBNet to get back the clean version. The joint discriminator
which is the third-player of our three-player min-max adversarial game tries to
couple both the ATANet and UDBNet. The three-player min-max adversarial game
stops, when the distributions modelled by the ATANet and the UDBNet align to
the same joint distribution over time. Thus, the joint discriminator enforces
the UDBNet to perform better on real degraded image. The experimental results
indicate the superior performance of the proposed model over existing
state-of-the-art algorithm on widely used DIBCO datasets. The source code of
the proposed system is publicly available at
https://github.com/VIROBO-15/UDBNET.
Related papers
- Noisy-Correspondence Learning for Text-to-Image Person Re-identification [50.07634676709067]
We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences.
Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
arXiv Detail & Related papers (2023-08-19T05:34:13Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - UVDoc: Neural Grid-based Document Unwarping [20.51368640747448]
Restoring the original, flat appearance of a printed document from casual photographs is a common everyday problem.
We propose a novel method for grid-based single-image document unwarping.
Our method performs geometric distortion correction via a fully convolutional deep neural network.
arXiv Detail & Related papers (2023-02-06T15:53:34Z) - Learning Weighting Map for Bit-Depth Expansion within a Rational Range [64.15915577164894]
Bit-depth expansion (BDE) is one of the emerging technologies to display high bit-depth (HBD) image from low bit-depth (LBD) source.
Existing BDE methods have no unified solution for various BDE situations.
We design a bit restoration network (BRNet) to learn a weight for each pixel, which indicates the ratio of the replenished value within a rational range.
arXiv Detail & Related papers (2022-04-26T02:27:39Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - RectiNet-v2: A stacked network architecture for document image dewarping [16.249023269158734]
We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input.
We train this model on warped document images simulated synthetically to compensate for lack of enough natural data.
We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-01T19:26:17Z) - The Forchheim Image Database for Camera Identification in the Wild [10.091921099426294]
Forchheim Image Database (FODB) consists of more than 23,000 images of 143 scenes by 27 smartphone cameras.
Each image is provided in 6 different qualities: the original camera-native version, and five copies from social networks.
General-purpose EfficientNet remarkably outperforms several dedicated forensic CNNs both on clean and compressed images.
arXiv Detail & Related papers (2020-11-04T11:54:54Z) - Two-stage generative adversarial networks for document image
binarization with color noise and background removal [7.639067237772286]
We propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks.
In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image.
In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size.
arXiv Detail & Related papers (2020-10-20T07:51:50Z) - Generate High Resolution Images With Generative Variational Autoencoder [0.0]
We present a novel neural network to generate high resolution images.
We replace the decoder of VAE with a discriminator while using the encoder as it is.
We evaluate our network on 3 different datasets: MNIST, LSUN and CelebA dataset.
arXiv Detail & Related papers (2020-08-12T20:15:34Z) - Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images [56.652027072552606]
We propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++.
By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image.
A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume.
arXiv Detail & Related papers (2020-06-22T13:48:09Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.