Evaluation of a Region Proposal Architecture for Multi-task Document
Layout Analysis
- URL: http://arxiv.org/abs/2106.11797v1
- Date: Tue, 22 Jun 2021 14:07:27 GMT
- Title: Evaluation of a Region Proposal Architecture for Multi-task Document
Layout Analysis
- Authors: Lorenzo Quir\'os and Enrique Vidal
- Abstract summary: Mask-RCNN architecture is designed to address the problem of baseline detection and region segmentation.
We present experimental results on two handwritten text datasets and one handwritten music dataset.
The analyzed architecture yields promising results, outperforming state-of-the-art techniques in all three datasets.
- Score: 0.685316573653194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically recognizing the layout of handwritten documents is an important
step towards useful extraction of information from those documents. The most
common application is to feed downstream applications such as automatic text
recognition and keyword spotting; however, the recognition of the layout also
helps to establish relationships between elements in the document which allows
to enrich the information that can be extracted. Most of the modern document
layout analysis systems are designed to address only one part of the document
layout problem, namely: baseline detection or region segmentation. In contrast,
we evaluate the effectiveness of the Mask-RCNN architecture to address the
problem of baseline detection and region segmentation in an integrated manner.
We present experimental results on two handwritten text datasets and one
handwritten music dataset. The analyzed architecture yields promising results,
outperforming state-of-the-art techniques in all three datasets.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - Knowledge-Driven Cross-Document Relation Extraction [3.868708275322908]
Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task.
We propose a novel approach, KXDocRE, that embed domain knowledge of entities with input text for cross-document RE.
arXiv Detail & Related papers (2024-05-22T11:30:59Z) - Object Recognition from Scientific Document based on Compartment Refinement Framework [2.699900017799093]
It has become increasingly important to extract valuable information from vast resources efficiently.
Current data extraction methods for scientific documents typically use rule-based (RB) or machine learning (ML) approaches.
We propose a new document layout analysis framework called CTBR(Compartment & Text Blocks Refinement)
arXiv Detail & Related papers (2023-12-14T15:36:49Z) - TransDocAnalyser: A Framework for Offline Semi-structured Handwritten
Document Analysis in the Legal Domain [3.5018563401895455]
We build the first semi-structured document analysis dataset in the legal domain.
This dataset combines a wide variety of handwritten text with printed text.
We propose an end-to-end framework for offline processing of handwritten semi-structured documents.
arXiv Detail & Related papers (2023-06-03T15:56:30Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - RDU: A Region-based Approach to Form-style Document Understanding [69.29541701576858]
Key Information Extraction (KIE) is aimed at extracting structured information from form-style documents.
We develop a new KIE model named Region-based Understanding Document (RDU)
RDU takes as input the text content and corresponding coordinates of a document, and tries to predict the result by localizing a bounding-box-like region.
arXiv Detail & Related papers (2022-06-14T14:47:48Z) - Combining Deep Learning and Reasoning for Address Detection in
Unstructured Text Documents [0.0]
We propose a hybrid approach that combines deep learning with reasoning for finding and extracting addresses from unstructured text documents.
We use a visual deep learning model to detect the boundaries of possible address regions on the scanned document images.
arXiv Detail & Related papers (2022-02-07T12:32:00Z) - Cross-Domain Document Object Detection: Benchmark Suite and Method [71.4339949510586]
Document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding.
We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain.
For each dataset, we provide the page images, bounding box annotations, PDF files, and the rendering layers extracted from the PDF files.
arXiv Detail & Related papers (2020-03-30T03:04:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.