U-DIADS-Bib: a full and few-shot pixel-precise dataset for document
layout analysis of ancient manuscripts
- URL: http://arxiv.org/abs/2401.08425v1
- Date: Tue, 16 Jan 2024 15:11:18 GMT
- Title: U-DIADS-Bib: a full and few-shot pixel-precise dataset for document
layout analysis of ancient manuscripts
- Authors: Silvia Zottin, Axel De Nardin, Emanuela Colombi, Claudio Piciarelli,
Filippo Pavan, Gian Luca Foresti
- Abstract summary: U-DIADS-Bib is a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities.
We propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation.
- Score: 9.76730765089929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document Layout Analysis, which is the task of identifying different semantic
regions inside of a document page, is a subject of great interest for both
computer scientists and humanities scholars as it represents a fundamental step
towards further analysis tasks for the former and a powerful tool to improve
and facilitate the study of the documents for the latter. However, many of the
works currently present in the literature, especially when it comes to the
available datasets, fail to meet the needs of both worlds and, in particular,
tend to lean towards the needs and common practices of the computer science
side, leading to resources that are not representative of the humanities real
needs. For this reason, the present paper introduces U-DIADS-Bib, a novel,
pixel-precise, non-overlapping and noiseless document layout analysis dataset
developed in close collaboration between specialists in the fields of computer
vision and humanities. Furthermore, we propose a novel, computer-aided,
segmentation pipeline in order to alleviate the burden represented by the
time-consuming process of manual annotation, necessary for the generation of
the ground truth segmentation maps. Finally, we present a standardized few-shot
version of the dataset (U-DIADS-BibFS), with the aim of encouraging the
development of models and solutions able to address this task with as few
samples as possible, which would allow for more effective use in a real-world
scenario, where collecting a large number of segmentations is not always
feasible.
Related papers
- A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain [3.9519587827662397]
We focus on relation extraction and text classification, using the showcase of eight biomedical benchmarks.
We consider trade-offs between accuracy and application costs, dive into training data generation through distant supervision and large language models such as ChatGPT, LLama, and Olmo, and discuss how to design final pipelines.
arXiv Detail & Related papers (2024-11-06T07:54:10Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Document AI: A Comparative Study of Transformer-Based, Graph-Based
Models, and Convolutional Neural Networks For Document Layout Analysis [3.231170156689185]
Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques.
One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting the content and spatial relationships of layout, image, and text.
arXiv Detail & Related papers (2023-08-29T16:58:03Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - A Survey of Historical Document Image Datasets [2.8707038627097226]
This paper presents a systematic literature review of image datasets for document image analysis.
It focuses on historical documents, such as handwritten manuscripts and early prints.
Finding appropriate datasets for historical document analysis is a crucial prerequisite to facilitate research using different machine learning algorithms.
arXiv Detail & Related papers (2022-03-16T09:56:48Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - Neural Language Modeling for Contextualized Temporal Graph Generation [49.21890450444187]
This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document.
arXiv Detail & Related papers (2020-10-20T07:08:00Z) - Vision-Based Layout Detection from Scientific Literature using Recurrent
Convolutional Neural Networks [12.221478896815292]
We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD)
SLLD is a shared subtask of several information extraction problems.
Our results show good improvement with fine-tuning of a pre-trained base network.
arXiv Detail & Related papers (2020-10-18T23:50:28Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.