Efficient few-shot learning for pixel-precise handwritten document
layout analysis
- URL: http://arxiv.org/abs/2210.15570v1
- Date: Thu, 27 Oct 2022 16:03:52 GMT
- Title: Efficient few-shot learning for pixel-precise handwritten document
layout analysis
- Authors: Axel De Nardin, Silvia Zottin, Matteo Paier, Gian Luca Foresti,
Emanuela Colombi, Claudio Piciarelli
- Abstract summary: We propose an efficient few-shot learning framework for layout analysis.
It achieves performances comparable to current state-of-the-art fully supervised methods on the publicly available DIVA-HisDB dataset.
- Score: 11.453393410516991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Layout analysis is a task of uttermost importance in ancient handwritten
document analysis and represents a fundamental step toward the simplification
of subsequent tasks such as optical character recognition and automatic
transcription. However, many of the approaches adopted to solve this problem
rely on a fully supervised learning paradigm. While these systems achieve very
good performance on this task, the drawback is that pixel-precise text labeling
of the entire training set is a very time-consuming process, which makes this
type of information rarely available in a real-world scenario. In the present
paper, we address this problem by proposing an efficient few-shot learning
framework that achieves performances comparable to current state-of-the-art
fully supervised methods on the publicly available DIVA-HisDB dataset.
Related papers
- UnSupDLA: Towards Unsupervised Document Layout Analysis [11.574592219976823]
A critical but frequently overlooked problem is the scarcity of labeled data needed for layout analysis.
We employ a vision-based approach for analyzing document layouts designed to train a network without labels.
arXiv Detail & Related papers (2024-06-10T13:06:28Z) - U-DIADS-Bib: a full and few-shot pixel-precise dataset for document
layout analysis of ancient manuscripts [9.76730765089929]
U-DIADS-Bib is a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities.
We propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation.
arXiv Detail & Related papers (2024-01-16T15:11:18Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition [16.987008461171065]
We explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition.
Our method consists in adding intermediate layers called adapters for each task, and efficiently distilling knowledge from the previous model while learning the current task.
We attain state-of-the-art performance on English, Italian and Russian scripts, whilst adding only a few parameters per task.
arXiv Detail & Related papers (2023-03-16T14:27:45Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Towards Unsupervised Sketch-based Image Retrieval [126.77787336692802]
We introduce a novel framework that simultaneously performs unsupervised representation learning and sketch-photo domain alignment.
Our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
arXiv Detail & Related papers (2021-05-18T02:38:22Z) - Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only.
We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z) - Multiple Document Datasets Pre-training Improves Text Line Detection
With Deep Neural Networks [2.5352713493505785]
We introduce a fully convolutional network for the document layout analysis task.
Our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents.
We show that Doc-UFCN outperforms state-of-the-art methods on various datasets.
arXiv Detail & Related papers (2020-12-28T09:48:33Z) - PICK: Processing Key Information Extraction from Documents using
Improved Graph Learning-Convolutional Networks [5.210482046387142]
Key Information Extraction from documents remains a challenge.
We introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE.
Our method outperforms baselines methods by significant margins.
arXiv Detail & Related papers (2020-04-16T05:20:16Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.