Document Domain Randomization for Deep Learning Document Layout
Extraction
- URL: http://arxiv.org/abs/2105.14931v1
- Date: Thu, 20 May 2021 19:16:04 GMT
- Title: Document Domain Randomization for Deep Learning Document Layout
Extraction
- Authors: Meng Ling and Jian Chen and Torsten M\"oller and Petra Isenberg and
Tobias Isenberg and Michael Sedlmair and Robert S. Laramee and Han-Wei Shen
and Jian Wu and C. Lee Giles
- Abstract summary: We present document domain randomization (DDR), the first successful transfer of convolutional neural networks (CNNs) trained only on graphically rendered pseudo-paper pages.
DDR renders pseudo-document pages by modeling randomized textual and non-textual contents of interest.
We show that high-fidelity semantic information is not necessary to label semantic classes but style mismatch between train and test can lower model accuracy.
- Score: 37.97092983885967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present document domain randomization (DDR), the first successful transfer
of convolutional neural networks (CNNs) trained only on graphically rendered
pseudo-paper pages to real-world document segmentation. DDR renders
pseudo-document pages by modeling randomized textual and non-textual contents
of interest, with user-defined layout and font styles to support joint learning
of fine-grained classes. We demonstrate competitive results using our DDR
approach to extract nine document classes from the benchmark CS-150 and papers
published in two domains, namely annual meetings of Association for
Computational Linguistics (ACL) and IEEE Visualization (VIS). We compare DDR to
conditions of style mismatch, fewer or more noisy samples that are more easily
obtained in the real world. We show that high-fidelity semantic information is
not necessary to label semantic classes but style mismatch between train and
test can lower model accuracy. Using smaller training samples had a slightly
detrimental effect. Finally, network models still achieved high test accuracy
when correct labels are diluted towards confusing labels; this behavior hold
across several classes.
Related papers
- Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification [0.0]
Efficient categorization of historical documents is crucial for fields such as genealogy, legal research and historical scholarship.
We propose a representational learning strategy that integrates deep learning models such as ResNet, masked Image Transformer (Di), and embedding segmentation.
arXiv Detail & Related papers (2024-05-23T04:28:50Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes [93.71909293023663]
Cross-modality Aligned Prototypes (CAPro) is a unified contrastive learning framework to learn visual representations with correct semantics.
CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition.
arXiv Detail & Related papers (2023-10-15T07:20:22Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - I2DFormer: Learning Image to Document Attention for Zero-Shot Image
Classification [123.90912800376039]
Online textual documents, e.g., Wikipedia, contain rich visual descriptions about object classes.
We propose I2DFormer, a novel transformer-based ZSL framework that jointly learns to encode images and documents.
Our method leads to highly interpretable results where document words can be grounded in the image regions.
arXiv Detail & Related papers (2022-09-21T12:18:31Z) - Knowing Where and What: Unified Word Block Pretraining for Document
Understanding [11.46378901674016]
We propose UTel, a language model with Unified TExt and layout pre-training.
Specifically, we propose two pre-training tasks: Surrounding Word Prediction (SWP) for the layout learning, and Contrastive learning of Word Embeddings (CWE) for identifying different word blocks.
In this way, the joint training of Masked Layout-Language Modeling (MLLM) and two newly proposed tasks enables the interaction between semantic and spatial features in a unified way.
arXiv Detail & Related papers (2022-07-28T09:43:06Z) - Semi-Supervised Learning of Semantic Correspondence with Pseudo-Labels [26.542718087103665]
SemiMatch is a semi-supervised solution for establishing dense correspondences across semantically similar images.
Our framework generates the pseudo-labels using the model's prediction itself between source and weakly-augmented target, and uses pseudo-labels to learn the model again between source and strongly-augmented target.
In experiments, SemiMatch achieves state-of-the-art performance on various benchmarks, especially on PF-Willow by a large margin.
arXiv Detail & Related papers (2022-03-30T03:52:50Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page
Text Recognition by learning to unfold [6.09170287691728]
We take a step from segmentation-free single line recognition towards segmentation-free multi-line / full page recognition.
We propose a novel and simple neural network module, termed textbfOrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer.
We achieve state-of-the-art character error rate on both IAM & ICDAR 2017 HTR benchmarks for handwriting recognition, surpassing all other methods in the literature.
arXiv Detail & Related papers (2020-06-12T22:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.