Accurate Fine-grained Layout Analysis for the Historical Tibetan
Document Based on the Instance Segmentation
- URL: http://arxiv.org/abs/2110.08164v1
- Date: Fri, 15 Oct 2021 15:49:44 GMT
- Title: Accurate Fine-grained Layout Analysis for the Historical Tibetan
Document Based on the Instance Segmentation
- Authors: Penghai Zhao, Weilan Wang, Xiaojuan Wang, Zhengqi Cai, Guowei Zhang,
and Yuqi Lu
- Abstract summary: This paper presents a fine-grained sub-line level layout analysis approach to perform layout analysis on the Kangyur historical Tibetan document.
We introduce an accelerated method to build the dataset which is dynamic and reliable.
Once the network is trained, instances of the text line, sentence, and titles can be segmented and identified.
The experimental results show that the proposed method delivers a decent 72.7% AP on our dataset.
- Score: 0.9420795715422711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate layout analysis without subsequent text-line segmentation remains an
ongoing challenge, especially when facing the Kangyur, a kind of historical
Tibetan document featuring considerable touching components and mottled
background. Aiming at identifying different regions in document images, layout
analysis is indispensable for subsequent procedures such as character
recognition. However, there was only a little research being carried out to
perform line-level layout analysis which failed to deal with the Kangyur. To
obtain the optimal results, a fine-grained sub-line level layout analysis
approach is presented. Firstly, we introduced an accelerated method to build
the dataset which is dynamic and reliable. Secondly, enhancement had been made
to the SOLOv2 according to the characteristics of the Kangyur. Then, we fed the
enhanced SOLOv2 with the prepared annotation file during the training phase.
Once the network is trained, instances of the text line, sentence, and titles
can be segmented and identified during the inference stage. The experimental
results show that the proposed method delivers a decent 72.7% AP on our
dataset. In general, this preliminary research provides insights into the
fine-grained sub-line level layout analysis and testifies the SOLOv2-based
approaches. We also believe that the proposed methods can be adopted on other
language documents with various layouts.
Related papers
- Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language.
Most existing VG datasets are constructed using simple description texts.
We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z) - The Learnable Typewriter: A Generative Approach to Text Analysis [17.355857281085164]
We present a generative document-specific approach to character analysis and recognition in text lines.
Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters.
arXiv Detail & Related papers (2023-02-03T11:17:59Z) - Page Layout Analysis of Text-heavy Historical Documents: a Comparison of
Textual and Visual Approaches [0.0]
Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest.
With highly complex layouts and mixed scripts, scholarly annotated are text-heavy documents which remain challenging for state-of-the-art models.
arXiv Detail & Related papers (2022-12-12T10:10:29Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Robust Text Line Detection in Historical Documents: Learning and
Evaluation Methods [1.9938405188113029]
We present a study conducted using three state-of-the-art systems Doc-UFCN, dhSegment and ARU-Net.
We show that it is possible to build generic models trained on a wide variety of historical document datasets that can correctly segment diverse unseen pages.
arXiv Detail & Related papers (2022-03-23T11:56:25Z) - Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods.
Under the unified framework, we ensure the consistent settings for non-core modules.
With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Learning to Summarize Passages: Mining Passage-Summary Pairs from
Wikipedia Revision Histories [110.54963847339775]
We propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories.
In particular, the method mines the main body passages and the introduction sentences which are added to the pages simultaneously.
The constructed dataset contains more than one hundred thousand passage-summary pairs.
arXiv Detail & Related papers (2020-04-06T12:11:50Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.