Unsupervised Neural Domain Adaptation for Document Image Binarization
- URL: http://arxiv.org/abs/2012.01204v1
- Date: Wed, 2 Dec 2020 13:42:38 GMT
- Title: Unsupervised Neural Domain Adaptation for Document Image Binarization
- Authors: Francisco J. Castellanos, Antonio-Javier Gallego, Jorge Calvo-Zaragoza
- Abstract summary: This paper proposes a method that combines neural networks and Domain Adaptation (DA) in order to carry out unsupervised document binarization.
Results show that our proposal successfully deals with the binarization of new document domains without the need for labeled data.
- Score: 13.848843012433187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binarization is a well-known image processing task, whose objective is to
separate the foreground of an image from the background. One of the many tasks
for which it is useful is that of preprocessing document images in order to
identify relevant information, such as text or symbols. The wide variety of
document types, typologies, alphabets, and formats makes binarization
challenging, and there are, therefore, multiple proposals with which to solve
this problem, from classical manually-adjusted methods, to more recent
approaches based on machine learning. The latter techniques require a large
amount of training data in order to obtain good results; however, labeling a
portion of each existing collection of documents is not feasible in practice.
This is a common problem in supervised learning, which can be addressed by
using the so-called Domain Adaptation (DA) techniques. These techniques take
advantage of the knowledge learned in one domain, for which labeled data are
available, to apply it to other domains for which there are no labeled data.
This paper proposes a method that combines neural networks and DA in order to
carry out unsupervised document binarization. However, when both the source and
target domains are very similar, this adaptation could be detrimental. Our
methodology, therefore, first measures the similarity between domains in an
innovative manner in order to determine whether or not it is appropriate to
apply the adaptation process. The results reported in the experimentation, when
evaluating up to 20 possible combinations among five different domains, show
that our proposal successfully deals with the binarization of new document
domains without the need for labeled data.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Multimodal Side-Tuning for Document Classification [3.0229888038442914]
Side-tuning is a methodology for network adaptation recently introduced to solve some of the problems related to previous approaches.
We show that side-tuning can be successfully employed also when different data sources are considered.
arXiv Detail & Related papers (2023-01-16T11:08:03Z) - Unifying Vision, Text, and Layout for Universal Document Processing [105.36490575974028]
We propose a Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation.
Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites.
arXiv Detail & Related papers (2022-12-05T22:14:49Z) - Domain Agnostic Few-Shot Learning For Document Intelligence [4.243926243206826]
Few-shot learning aims to generalize to novel classes with only a few samples with class labels.
In this work, we address the problem of few-shot document image classification under domain shift.
arXiv Detail & Related papers (2021-10-29T03:19:31Z) - Domain Adaptive Semantic Segmentation without Source Data [50.18389578589789]
We investigate domain adaptive semantic segmentation without source data, which assumes that the model is pre-trained on the source domain.
We propose an effective framework for this challenging problem with two components: positive learning and negative learning.
Our framework can be easily implemented and incorporated with other methods to further enhance the performance.
arXiv Detail & Related papers (2021-10-13T04:12:27Z) - Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation [78.28390172958643]
We identify two key aspects that can help to alleviate multiple domain-shifts in the multi-target domain adaptation (MTDA)
We propose Curriculum Graph Co-Teaching (CGCT) that uses a dual classifier head, with one of them being a graph convolutional network (GCN) which aggregates features from similar samples across the domains.
When the domain labels are available, we propose Domain-aware Curriculum Learning (DCL), a sequential adaptation strategy that first adapts on the easier target domains, followed by the harder ones.
arXiv Detail & Related papers (2021-04-01T23:41:41Z) - Towards Recognizing New Semantic Concepts in New Visual Domains [9.701036831490768]
We argue that it is crucial to design deep architectures that can operate in previously unseen visual domains and recognize novel semantic concepts.
In the first part of the thesis, we describe different solutions to enable deep models to generalize to new visual domains.
In the second part, we show how to extend the knowledge of a pretrained deep model to new semantic concepts, without access to the original training set.
arXiv Detail & Related papers (2020-12-16T16:23:40Z) - mDALU: Multi-Source Domain Adaptation and Label Unification with Partial
Datasets [102.62639692656458]
This paper treats this task as a multi-source domain adaptation and label unification problem.
Our method consists of a partially-supervised adaptation stage and a fully-supervised adaptation stage.
We verify the method on three different tasks, image classification, 2D semantic image segmentation, and joint 2D-3D semantic segmentation.
arXiv Detail & Related papers (2020-12-15T15:58:03Z) - Two-stage generative adversarial networks for document image
binarization with color noise and background removal [7.639067237772286]
We propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks.
In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image.
In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size.
arXiv Detail & Related papers (2020-10-20T07:51:50Z) - A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks.
In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data.
To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.