Related papers: Two-stage generative adversarial networks for document image binarization with color noise and background removal

Two-stage generative adversarial networks for document image binarization with color noise and background removal

URL: http://arxiv.org/abs/2010.10103v3
Date: Tue, 27 Apr 2021 08:17:44 GMT
Title: Two-stage generative adversarial networks for document image binarization with color noise and background removal
Authors: Sungho Suh, Jihun Kim, Paul Lukowicz and Yong Oh Lee
Abstract summary: We propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks. In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image. In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size.
Score: 7.639067237772286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document image enhancement and binarization methods are often used to improve the accuracy and efficiency of document image analysis tasks such as text recognition. Traditional non-machine-learning methods are constructed on low-level features in an unsupervised manner but have difficulty with binarization on documents with severely degraded backgrounds. Convolutional neural network-based methods focus only on grayscale images and on local textual features. In this paper, we propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks. In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image for document image enhancement. In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size. For the adversarial neural networks, we formulate loss functions between a discriminator and generators having an encoder-decoder structure. Experimental results show that the proposed method achieves better performance than many classical and state-of-the-art algorithms over the Document Image Binarization Contest (DIBCO) datasets, the LRDE Document Binarization Dataset (LRDE DBD), and our shipping label image dataset. We plan to release the shipping label dataset as well as our implementation code at github.com/opensuh/DocumentBinarization/.

Related papers

Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model. We show that UNIT significantly outperforms existing methods on document-related tasks. Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z)
Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems. Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner. We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space. We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z)
A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer. Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z)
DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization [17.087982099845156]
Document binarization is a fundamental and crucial step for achieving the most optimal performance in any document analysis task. We propose DocBinFormer, a novel two-level vision transformer (TL-ViT) architecture based on vision transformers for effective document image binarization.
arXiv Detail & Related papers (2023-12-06T16:01:29Z)
CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization [3.0175628677371935]
This paper introduces a novelty method employing generative adversarial networks based on color channel. The proposed method involves three stages: image preprocessing, image enhancement, and image binarization. The experimental results demonstrate that CCDWT-GAN achieves a top two performance on multiple benchmark datasets.
arXiv Detail & Related papers (2023-05-27T08:55:56Z)
Document Image Binarization in JPEG Compressed Domain using Dual Discriminator Generative Adversarial Networks [0.0]
The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres. The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
arXiv Detail & Related papers (2022-09-13T12:07:32Z)
CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS) CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment. Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z)
Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF. We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials. Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
Text Extraction and Restoration of Old Handwritten Documents [3.514869837986596]
This paper describes two novel methods for the restoration of old degraded handwritten documents using deep neural network. A small-scale dataset of 26 heritage letters images is introduced. Experiments demonstrate that the proposed systems perform well on handwritten document images with severe degradations.
arXiv Detail & Related papers (2020-01-23T05:42:39Z)
Supervised and Unsupervised Learning of Parameterized Color Enhancement [112.88623543850224]
We tackle the problem of color enhancement as an image translation task using both supervised and unsupervised learning. We achieve state-of-the-art results compared to both supervised (paired data) and unsupervised (unpaired data) image enhancement methods on the MIT-Adobe FiveK benchmark. We show the generalization capability of our method, by applying it on photos from the early 20th century and to dark video frames.
arXiv Detail & Related papers (2019-12-30T13:57:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.