A Fair Evaluation of Various Deep Learning-Based Document Image
Binarization Approaches
- URL: http://arxiv.org/abs/2401.11831v1
- Date: Mon, 22 Jan 2024 10:42:51 GMT
- Title: A Fair Evaluation of Various Deep Learning-Based Document Image
Binarization Approaches
- Authors: Richin Sukesh, Mathias Seuret, Anguelos Nicolaou, Martin Mayr, Vincent
Christlein
- Abstract summary: Binarization of document images is an important pre-processing step in the field of document analysis.
Deep learning techniques are able to generate binarized versions of the images by learning context-dependent features.
This work focuses on the evaluation of different deep learning-based methods under the same evaluation protocol.
- Score: 5.393847875065119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binarization of document images is an important pre-processing step in the
field of document analysis. Traditional image binarization techniques usually
rely on histograms or local statistics to identify a valid threshold to
differentiate between different aspects of the image. Deep learning techniques
are able to generate binarized versions of the images by learning
context-dependent features that are less error-prone to degradation typically
occurring in document images. In recent years, many deep learning-based methods
have been developed for document binarization. But which one to choose? There
have been no studies that compare these methods rigorously. Therefore, this
work focuses on the evaluation of different deep learning-based methods under
the same evaluation protocol. We evaluate them on different Document Image
Binarization Contest (DIBCO) datasets and obtain very heterogeneous results. We
show that the DE-GAN model was able to perform better compared to other models
when evaluated on the DIBCO2013 dataset while DP-LinkNet performed best on the
DIBCO2017 dataset. The 2-StageGAN performed best on the DIBCO2018 dataset while
SauvolaNet outperformed the others on the DIBCO2019 challenge. Finally, we make
the code, all models and evaluation publicly available
(https://github.com/RichSu95/Document_Binarization_Collection) to ensure
reproducibility and simplify future binarization evaluations.
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical
Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer.
Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Variational Augmentation for Enhancing Historical Document Image
Binarization [11.342730352935913]
Historical Document Image Binarization is a well-known segmentation problem in image processing.
We have proposed a novel two-stage framework -- the first of which comprises a generator that generates degraded samples using variational inference.
The second is a CNN-based binarization network that trains on the generated data.
arXiv Detail & Related papers (2022-11-12T06:01:21Z) - Pattern Spotting and Image Retrieval in Historical Documents using Deep
Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents.
Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations.
The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Multiple Document Datasets Pre-training Improves Text Line Detection
With Deep Neural Networks [2.5352713493505785]
We introduce a fully convolutional network for the document layout analysis task.
Our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents.
We show that Doc-UFCN outperforms state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2020-12-28T09:48:33Z) - Two-stage generative adversarial networks for document image
binarization with color noise and background removal [7.639067237772286]
We propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks.
In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image.
In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size.
arXiv Detail & Related papers (2020-10-20T07:51:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.