Document Image Binarization in JPEG Compressed Domain using Dual
Discriminator Generative Adversarial Networks
- URL: http://arxiv.org/abs/2209.05921v1
- Date: Tue, 13 Sep 2022 12:07:32 GMT
- Title: Document Image Binarization in JPEG Compressed Domain using Dual
Discriminator Generative Adversarial Networks
- Authors: Bulla Rajesh and Manav Kamlesh Agrawal and Milan Bhuva and Kisalaya
Kishore and Mohammed Javed
- Abstract summary: The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres.
The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image binarization techniques are being popularly used in enhancement of
noisy and/or degraded images catering different Document Image Anlaysis (DIA)
applications like word spotting, document retrieval, and OCR. Most of the
existing techniques focus on feeding pixel images into the Convolution Neural
Networks to accomplish document binarization, which may not produce effective
results when working with compressed images that need to be processed without
full decompression. Therefore in this research paper, the idea of document
image binarization directly using JPEG compressed stream of document images is
proposed by employing Dual Discriminator Generative Adversarial Networks
(DD-GANs). Here the two discriminator networks - Global and Local work on
different image ratios and use focal loss as generator loss. The proposed model
has been thoroughly tested with different versions of DIBCO dataset having
challenges like holes, erased or smudged ink, dust, and misplaced fibres. The
model proved to be highly robust, efficient both in terms of time and space
complexities, and also resulted in state-of-the-art performance in JPEG
compressed domain.
Related papers
- A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical
Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer.
Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z) - DWT-CompCNN: Deep Image Classification Network for High Throughput JPEG
2000 Compressed Documents [0.9405458160620535]
DWT CompCNN is proposed for classification of documents that are compressed using High Throughput JPEG 2000 (HTJ2K) algorithm.
The proposed model is time and space efficient, and also achieves a better classification accuracy in compressed domain.
arXiv Detail & Related papers (2023-06-02T08:33:58Z) - CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using
Discrete Wavelet Transform for Document Image Binarization [3.0175628677371935]
This paper introduces a novelty method employing generative adversarial networks based on color channel.
The proposed method involves three stages: image preprocessing, image enhancement, and image binarization.
The experimental results demonstrate that CCDWT-GAN achieves a top two performance on multiple benchmark datasets.
arXiv Detail & Related papers (2023-05-27T08:55:56Z) - DocMAE: Document Image Rectification via Self-supervised Representation
Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification.
We first mask random patches of the background-excluded document images and then reconstruct the missing pixels.
With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Learned Lossless Compression for JPEG via Frequency-Domain Prediction [50.20577108662153]
We propose a novel framework for learned lossless compression of JPEG images.
To enable learning in the frequency domain, DCT coefficients are partitioned into groups to utilize implicit local redundancy.
An autoencoder-like architecture is designed based on the weight-shared blocks to realize entropy modeling of grouped DCT coefficients.
arXiv Detail & Related papers (2023-03-05T13:15:28Z) - T2CI-GAN: Text to Compressed Image generation using Generative
Adversarial Network [9.657133242509671]
In practice, most of the visual data are processed and transmitted in the compressed representation form.
The proposed work attempts to generate the visual data directly in the compressed representation form using Deep Convolutional GANs (DCGANs)
The first model is directly trained with JPEG compressed DCT images (compressed domain) to generate the compressed images from text descriptions.
The second model is trained with RGB images (pixel domain) to generate JPEG compressed DCT representation from text descriptions.
arXiv Detail & Related papers (2022-10-01T09:26:25Z) - Two-stage generative adversarial networks for document image
binarization with color noise and background removal [7.639067237772286]
We propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks.
In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image.
In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size.
arXiv Detail & Related papers (2020-10-20T07:51:50Z) - Learning to Improve Image Compression without Changing the Standard
Decoder [100.32492297717056]
We propose learning to improve the encoding performance with the standard decoder.
Specifically, a frequency-domain pre-editing method is proposed to optimize the distribution of DCT coefficients.
We do not modify the JPEG decoder and therefore our approach is applicable when viewing images with the widely used standard JPEG decoder.
arXiv Detail & Related papers (2020-09-27T19:24:42Z) - A U-Net Based Discriminator for Generative Adversarial Networks [86.67102929147592]
We propose an alternative U-Net based discriminator architecture for generative adversarial networks (GANs)
The proposed architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images.
The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics.
arXiv Detail & Related papers (2020-02-28T11:16:54Z) - Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency.
Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images.
Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.