CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using
Discrete Wavelet Transform for Document Image Binarization
- URL: http://arxiv.org/abs/2305.17420v2
- Date: Thu, 24 Aug 2023 06:39:18 GMT
- Title: CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using
Discrete Wavelet Transform for Document Image Binarization
- Authors: Rui-Yang Ju, Yu-Shian Lin, Jen-Shiun Chiang, Chih-Chia Chen, Wei-Han
Chen, Chun-Tse Chien
- Abstract summary: This paper introduces a novelty method employing generative adversarial networks based on color channel.
The proposed method involves three stages: image preprocessing, image enhancement, and image binarization.
The experimental results demonstrate that CCDWT-GAN achieves a top two performance on multiple benchmark datasets.
- Score: 3.0175628677371935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To efficiently extract textual information from color degraded document
images is a significant research area. The prolonged imperfect preservation of
ancient documents has led to various types of degradation, such as page
staining, paper yellowing, and ink bleeding. These types of degradation badly
impact the image processing for features extraction. This paper introduces a
novelty method employing generative adversarial networks based on color channel
using discrete wavelet transform (CCDWT-GAN). The proposed method involves
three stages: image preprocessing, image enhancement, and image binarization.
In the initial step, we apply discrete wavelet transform (DWT) to retain the
low-low (LL) subband image, thereby enhancing image quality. Subsequently, we
divide the original input image into four single-channel colors (red, green,
blue, and gray) to separately train adversarial networks. For the extraction of
global and local features, we utilize the output image from the image
enhancement stage and the entire input image to train adversarial networks
independently, and then combine these two results as the final output. To
validate the positive impact of the image enhancement and binarization stages
on model performance, we conduct an ablation study. This work compares the
performance of the proposed method with other state-of-the-art (SOTA) methods
on DIBCO and H-DIBCO ((Handwritten) Document Image Binarization Competition)
datasets. The experimental results demonstrate that CCDWT-GAN achieves a top
two performance on multiple benchmark datasets. Notably, on DIBCO 2013 and 2016
dataset, our method achieves F-measure (FM) values of 95.24 and 91.46,
respectively.
Related papers
- Transforming Color: A Novel Image Colorization Method [8.041659727964305]
This paper introduces a novel method for image colorization that utilizes a color transformer and generative adversarial networks (GANs)
The proposed method integrates a transformer architecture to capture global information and a GAN framework to improve visual quality.
Experimental results show that the proposed network significantly outperforms other state-of-the-art colorization techniques.
arXiv Detail & Related papers (2024-10-07T07:23:42Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical
Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer.
Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Three-stage binarization of color document images based on discrete wavelet transform and generative adversarial networks [0.0]
This work proposes an effective three-stage network method to image enhancement and binarization of degraded documents using generative adversarial networks (GANs)
The experimental results show that the Avg-Score metrics of the proposed method are 77.64, 77.95, 79.05, 76.38, 75.34, and 77.00 on the (H)-DIBCO 2011, 2013, 2014, 2016, 2017, and 2018 datasets.
arXiv Detail & Related papers (2022-11-29T11:17:34Z) - Document Image Binarization in JPEG Compressed Domain using Dual
Discriminator Generative Adversarial Networks [0.0]
The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres.
The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
arXiv Detail & Related papers (2022-09-13T12:07:32Z) - Palette: Image-to-Image Diffusion Models [50.268441533631176]
We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models.
On four challenging image-to-image translation tasks, Palette outperforms strong GAN and regression baselines.
We report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images.
arXiv Detail & Related papers (2021-11-10T17:49:29Z) - Two-stage generative adversarial networks for document image
binarization with color noise and background removal [7.639067237772286]
We propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks.
In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image.
In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size.
arXiv Detail & Related papers (2020-10-20T07:51:50Z) - Instance-aware Image Colorization [51.12040118366072]
In this paper, we propose a method for achieving instance-aware colorization.
Our network architecture leverages an off-the-shelf object detector to obtain cropped object images.
We use a similar network to extract the full-image features and apply a fusion module to predict the final colors.
arXiv Detail & Related papers (2020-05-21T17:59:23Z) - Supervised and Unsupervised Learning of Parameterized Color Enhancement [112.88623543850224]
We tackle the problem of color enhancement as an image translation task using both supervised and unsupervised learning.
We achieve state-of-the-art results compared to both supervised (paired data) and unsupervised (unpaired data) image enhancement methods on the MIT-Adobe FiveK benchmark.
We show the generalization capability of our method, by applying it on photos from the early 20th century and to dark video frames.
arXiv Detail & Related papers (2019-12-30T13:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.