Page Layout Analysis of Text-heavy Historical Documents: a Comparison of
Textual and Visual Approaches
- URL: http://arxiv.org/abs/2212.13924v1
- Date: Mon, 12 Dec 2022 10:10:29 GMT
- Title: Page Layout Analysis of Text-heavy Historical Documents: a Comparison of
Textual and Visual Approaches
- Authors: Najem-Meyer Sven, Romanello Matteo
- Abstract summary: Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest.
With highly complex layouts and mixed scripts, scholarly annotated are text-heavy documents which remain challenging for state-of-the-art models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Page layout analysis is a fundamental step in document processing which
enables to segment a page into regions of interest. With highly complex layouts
and mixed scripts, scholarly commentaries are text-heavy documents which remain
challenging for state-of-the-art models. Their layout considerably varies
across editions and their most important regions are mainly defined by semantic
rather than graphical characteristics such as position or appearance. This
setting calls for a comparison between textual, visual and hybrid approaches.
We therefore assess the performances of two transformers (LayoutLMv3 and
RoBERTa) and an objection-detection network (YOLOv5). If results show a clear
advantage in favor of the latter, we also list several caveats to this finding.
In addition to our experiments, we release a dataset of ca. 300 annotated pages
sampled from 19th century commentaries.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Story Visualization by Online Text Augmentation with Context Memory [64.86944645907771]
We propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation.
The proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision.
arXiv Detail & Related papers (2023-08-15T05:08:12Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Neural Graph Matching for Modification Similarity Applied to Electronic
Document Comparison [0.0]
Document comparison is a common task in the legal and financial industries.
In this paper, we present a novel neural graph matching approach applied to document comparison.
arXiv Detail & Related papers (2022-04-12T02:37:54Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Including Keyword Position in Image-based Models for Act Segmentation of
Historical Registers [2.064923532131528]
We focus on the use of both visual and textual information for segmenting historical registers into structured and meaningful units such as acts.
An act is a text recording containing valuable knowledge such as demographic information (baptism, marriage or death) or royal decisions (donation or pardon)
arXiv Detail & Related papers (2021-09-17T11:38:34Z) - Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task.
By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information.
Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z) - WikiAsp: A Dataset for Multi-domain Aspect-based Summarization [69.13865812754058]
We propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation.
Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
arXiv Detail & Related papers (2020-11-16T10:02:52Z) - The Devil is in the Details: Evaluating Limitations of Transformer-based
Methods for Granular Tasks [19.099852869845495]
Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks.
We focus on the problem of textual similarity from two perspectives: matching documents on a granular level, and an abstract level.
We empirically demonstrate, across two datasets from different domains, that despite high performance in abstract document matching as expected, contextual embeddings are consistently (and at times, vastly) outperformed by simple baselines like TF-IDF for more granular tasks.
arXiv Detail & Related papers (2020-11-02T18:41:32Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z) - Combining Visual and Textual Features for Semantic Segmentation of
Historical Newspapers [2.5899040911480187]
We introduce a multimodal approach for the semantic segmentation of historical newspapers.
Based on experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features.
Results show consistent improvement of multimodal models in comparison to a strong visual baseline.
arXiv Detail & Related papers (2020-02-14T17:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.