Multiple Document Datasets Pre-training Improves Text Line Detection
With Deep Neural Networks
- URL: http://arxiv.org/abs/2012.14163v2
- Date: Mon, 29 Mar 2021 11:36:02 GMT
- Title: Multiple Document Datasets Pre-training Improves Text Line Detection
With Deep Neural Networks
- Authors: M\'elodie Boillet, Christopher Kermorvant, Thierry Paquet
- Abstract summary: We introduce a fully convolutional network for the document layout analysis task.
Our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents.
We show that Doc-UFCN outperforms state-of-the-art methods on various datasets.
- Score: 2.5352713493505785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a fully convolutional network for the document
layout analysis task. While state-of-the-art methods are using models
pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped
model trained from scratch for detecting objects from historical documents. We
consider the line segmentation task and more generally the layout analysis
problem as a pixel-wise classification task then our model outputs a
pixel-labeling of the input images. We show that Doc-UFCN outperforms
state-of-the-art methods on various datasets and also demonstrate that the
pre-trained parts on natural scene images are not required to reach good
results. In addition, we show that pre-training on multiple document datasets
can improve the performances. We evaluate the models using various metrics to
have a fair and complete comparison between the methods.
Related papers
- Image Generation and Learning Strategy for Deep Document Forgery
Detection [7.585489507445007]
Recent advancements in deep neural network (DNN) methods for generative tasks may amplify the threat of document forgery.
We construct a training dataset of document forgery images, named FD-VIED, by emulating possible attacks.
In our experiments, we demonstrate that our approach enhances detection performance.
arXiv Detail & Related papers (2023-11-07T01:40:00Z) - StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training [64.37272287179661]
StrucTexTv2 is an effective document image pre-training framework.
It consists of two self-supervised pre-training tasks: masked image modeling and masked language modeling.
It achieves competitive or even new state-of-the-art performance in various downstream tasks such as image classification, layout analysis, table structure recognition, document OCR, and information extraction.
arXiv Detail & Related papers (2023-03-01T07:32:51Z) - Unifying Vision, Text, and Layout for Universal Document Processing [105.36490575974028]
We propose a Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation.
Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites.
arXiv Detail & Related papers (2022-12-05T22:14:49Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - DiT: Self-supervised Pre-training for Document Image Transformer [85.78807512344463]
We propose DiT, a self-supervised pre-trained Document Image Transformer model.
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks.
Experiment results have illustrated that the self-supervised pre-trained DiT model achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-03-04T15:34:46Z) - Neural Photometry-guided Visual Attribute Transfer [4.630419389180576]
We present a deep learning-based method for propagating visual material attributes to larger samples of the same or similar materials.
For training, we leverage images of the material taken under multiple illuminations and a dedicated data augmentation policy.
Our model relies on a supervised image-to-image translation framework and is agnostic to the transferred domain.
arXiv Detail & Related papers (2021-12-05T09:22:28Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - RectiNet-v2: A stacked network architecture for document image dewarping [16.249023269158734]
We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input.
We train this model on warped document images simulated synthetically to compensate for lack of enough natural data.
We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-01T19:26:17Z) - Unsupervised Deep Learning for Handwritten Page Segmentation [0.0]
We present an unsupervised deep learning method for page segmentation.
A siamese neural network is trained to differentiate between patches using their measurable properties.
Our experiments show that the proposed unsupervised method is as effective as typical supervised methods.
arXiv Detail & Related papers (2021-01-19T07:13:38Z) - Self-Supervised Representation Learning on Document Images [8.927538538637783]
We show that patch-based pre-training performs poorly on document images because of their different structural properties and poor intra-sample semantic information.
We propose two context-aware alternatives to improve performance on the Tobacco-3482 image classification task.
arXiv Detail & Related papers (2020-04-18T10:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.