Navigating the Mise-en-Page: Interpretive Machine Learning Approaches to
the Visual Layouts of Multi-Ethnic Periodicals
- URL: http://arxiv.org/abs/2109.01732v1
- Date: Fri, 3 Sep 2021 21:10:38 GMT
- Title: Navigating the Mise-en-Page: Interpretive Machine Learning Approaches to
the Visual Layouts of Multi-Ethnic Periodicals
- Authors: Benjamin Charles Germain Lee, Joshua Ortiz Baco, Sarah H. Salter, Jim
Casey
- Abstract summary: Our method combines Chronicling America's MARC data and the Newspaper Navigator machine learning dataset to identify the visual patterns of newspaper page layouts.
By analyzing high-dimensional visual similarity, we aim to better understand how editors spoke and protested through the layout of their papers.
- Score: 0.19116784879310028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a computational method of analysis that draws from
machine learning, library science, and literary studies to map the visual
layouts of multi-ethnic newspapers from the late 19th and early 20th century
United States. This work departs from prior approaches to newspapers that focus
on individual pieces of textual and visual content. Our method combines
Chronicling America's MARC data and the Newspaper Navigator machine learning
dataset to identify the visual patterns of newspaper page layouts. By analyzing
high-dimensional visual similarity, we aim to better understand how editors
spoke and protested through the layout of their papers.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Entry Separation using a Mixed Visual and Textual Language Model:
Application to 19th century French Trade Directories [18.323615434182553]
A key challenge is to correctly segment what constitutes the basic text regions for the target database.
We propose a new pragmatic approach whose efficiency is demonstrated on 19th century French Trade Directories.
By injecting special visual tokens, coding, for instance, indentation or breaks, into the token stream of the language model used for NER purpose, we can leverage both textual and visual knowledge simultaneously.
arXiv Detail & Related papers (2023-02-17T15:30:44Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - Multi-Modal Masked Autoencoders for Medical Vision-and-Language
Pre-Training [62.215025958347105]
We propose a self-supervised learning paradigm with multi-modal masked autoencoders.
We learn cross-modal domain knowledge by reconstructing missing pixels and tokens from randomly masked images and texts.
arXiv Detail & Related papers (2022-09-15T07:26:43Z) - Automatic Image Content Extraction: Operationalizing Machine Learning in
Humanistic Photographic Studies of Large Visual Archives [81.88384269259706]
We introduce Automatic Image Content Extraction framework for machine learning-based search and analysis of large image archives.
The proposed framework can be applied in several domains in humanities and social sciences.
arXiv Detail & Related papers (2022-04-05T12:19:24Z) - DocBed: A Multi-Stage OCR Solution for Documents with Complex Layouts [2.885058600042882]
This work releases a dataset of 3000 fully-annotated, real-world newspaper images from 21 different U.S. states.
It proposes layout segmentation as a precursor to existing optical character recognition (OCR) engines.
It provides a thorough and structured evaluation protocol for isolated layout segmentation and end-to-end OCR.
arXiv Detail & Related papers (2022-02-03T05:21:31Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Neural Content Extraction for Poster Generation of Scientific Papers [84.30128728027375]
The problem of poster generation for scientific papers is under-investigated.
Previous studies focus mainly on poster layout and panel composition, while neglecting the importance of content extraction.
To get both textual and visual elements of a poster panel, a neural extractive model is proposed to extract text, figures and tables of a paper section simultaneously.
arXiv Detail & Related papers (2021-12-16T01:19:37Z) - The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content
from 16 Million Historic Newspaper Pages in Chronicling America [10.446473806802578]
We introduce a visual content recognition model trained on bounding box annotations of photographs, illustrations, maps, comics, and editorial cartoons.
We describe our pipeline that utilizes this deep learning model to extract 7 classes of visual content.
We report the results of running the pipeline on 16.3 million pages from the Chronicling America corpus.
arXiv Detail & Related papers (2020-05-04T15:51:13Z) - An Evaluation of DNN Architectures for Page Segmentation of Historical
Newspapers [0.0]
We evaluate 11 different published Deep Neural Networks backbone architectures and 9 different tiling and scaling configurations for separating text, tables or table column lines.
We show the influence of the number of labels and the number of training pages on the segmentation quality, which we measure using the Matthews Correlation Coefficient.
Our results show that (depending on the task) Inception-ResNet-v2 and EfficientNet backbones work best, vertical tiling is generally preferable to other tiling approaches.
arXiv Detail & Related papers (2020-04-15T20:05:54Z) - Combining Visual and Textual Features for Semantic Segmentation of
Historical Newspapers [2.5899040911480187]
We introduce a multimodal approach for the semantic segmentation of historical newspapers.
Based on experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features.
Results show consistent improvement of multimodal models in comparison to a strong visual baseline.
arXiv Detail & Related papers (2020-02-14T17:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.