Related papers: Page Layout Analysis System for Unconstrained Historic Documents

Page Layout Analysis System for Unconstrained Historic Documents

URL: http://arxiv.org/abs/2102.11838v1
Date: Tue, 23 Feb 2021 18:13:36 GMT
Title: Page Layout Analysis System for Unconstrained Historic Documents
Authors: Old\v{r}ich Kodym, Michal Hradi\v{s}
Abstract summary: We propose extending a CNN-based text baseline detection system by adding line height and text block boundary predictions. We demonstrate that the proposed method performs well on the cBAD baseline detection dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extraction of text regions and individual text lines from historic documents is necessary for automatic transcription. We propose extending a CNN-based text baseline detection system by adding line height and text block boundary predictions to the model output, allowing the system to extract more comprehensive layout information. We also show that pixel-wise text orientation prediction can be used for processing documents with multiple text orientations. We demonstrate that the proposed method performs well on the cBAD baseline detection dataset. Additionally, we benchmark the method on newly introduced PERO layout dataset which we also make public.

Related papers

FP-THD: Full page transcription of historical documents [0.0]
This work proposes a pipeline for the transcription of historical documents preserving special features.<n>We analyze historical text images using a layout analysis model to extract text lines, which are then processed by an OCR model to generate a fully digitized page.
arXiv Detail & Related papers (2026-01-20T07:13:38Z)
LIGHT: Multi-Modal Text Linking on Historical Maps [1.8399976559754367]
Light is a novel multi-modal approach that integrates linguistic, image, and geometric features for linking text on historical maps.<n>It outperforms existing methods on the ICDAR 2024/2025 MapText Competition data.
arXiv Detail & Related papers (2025-06-27T19:18:00Z)
Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model. A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances. Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z)
Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis. Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
The Learnable Typewriter: A Generative Approach to Text Analysis [17.355857281085164]
We present a generative document-specific approach to character analysis and recognition in text lines. Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters.
arXiv Detail & Related papers (2023-02-03T11:17:59Z)
SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map. We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z)
Unsupervised learning of text line segmentation by differentiating coarse patterns [0.0]
We present an unsupervised deep learning method that embeds document image patches to a compact Euclidean space where distances correspond to a coarse text line pattern similarity. Text line segmentation can be easily implemented using standard techniques with the embedded feature vectors. We evaluate the method qualitatively and quantitatively on several variants of text line segmentation datasets to demonstrate its effectivity.
arXiv Detail & Related papers (2021-05-19T21:21:30Z)
Unsupervised deep learning for text line segmentation [0.0]
A common method is to train a deep learning network for embedding the document image into an image of blob lines that are tracing the text lines. This paper presents an unsupervised embedding of document image patches without a need for annotations. We show that the outliers do not harm the convergence and the network learns to discriminate the text lines from the spaces between text lines.
arXiv Detail & Related papers (2020-03-19T08:57:53Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.