Related papers: Making History Readable

Making History Readable

URL: http://arxiv.org/abs/2411.17600v1
Date: Tue, 26 Nov 2024 17:06:58 GMT
Title: Making History Readable
Authors: Bipasha Banerjee, Jennifer Goyne, William A. Ingram,
Abstract summary: This poster highlights three collections focusing on handwritten letters, newspapers, and digitized topographic maps. We discuss the challenges with each collection and detail our approaches to address them. Our proposed methods aim to enhance the user experience by making the contents in these collections easier to search and navigate.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The Virginia Tech University Libraries (VTUL) Digital Library Platform (DLP) hosts digital collections that offer our users access to a wide variety of documents of historical and cultural importance. These collections are not only of academic importance but also provide our users with a glance at local historical events. Our DLP contains collections comprising digital objects featuring complex layouts, faded imagery, and hard-to-read handwritten text, which makes providing online access to these materials challenging. To address these issues, we integrate AI into our DLP workflow and convert the text in the digital objects into a machine-readable format. To enhance the user experience with our historical collections, we use custom AI agents for handwriting recognition, text extraction, and large language models (LLMs) for summarization. This poster highlights three collections focusing on handwritten letters, newspapers, and digitized topographic maps. We discuss the challenges with each collection and detail our approaches to address them. Our proposed methods aim to enhance the user experience by making the contents in these collections easier to search and navigate.

Related papers

Knowledge Graphs for Digitized Manuscripts in Jagiellonian Digital Library Application [8.732274235941974]
Galleries, libraries, archives and museums (GLAM institutions) are actively digitizing their holdings and creates extensive digital collections.<n>These collections are often enriched with metadata describing items but not exactly their contents.<n>We explore an integrated methodology of computer vision (CV), artificial intelligence (AI) and semantic web technologies to enrich metadata and construct knowledge graphs for digitized manuscripts and incunabula.
arXiv Detail & Related papers (2025-05-29T14:49:24Z)
Towards Visual Text Grounding of Multimodal Large Language Model [88.0588924255417]
We introduce TRIG, a novel task with a newly designed instruction dataset for benchmarking text-rich image grounding. Specifically, we propose an OCR-LLM-human interaction pipeline to create 800 manually annotated question-answer pairs as a benchmark. A comprehensive evaluation of various MLLMs on our proposed benchmark exposes substantial limitations in their grounding capability on text-rich images.
arXiv Detail & Related papers (2025-04-07T12:01:59Z)
ParsiPy: NLP Toolkit for Historical Persian Texts in Python [1.637832760977605]
This work introduces ParsiPy, an NLP toolkit to handle phonetic transcriptions and analyze ancient texts. ParsiPy offers modules for tokenization, lemmatization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embedding.
arXiv Detail & Related papers (2025-03-22T16:21:29Z)
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora [2.3251886193174114]
We present an automated pipeline that evaluates the potential information gain from text collections without requiring model training or fine-tuning.<n>Our method generates multiple choice questions (MCQs) from texts and measures an LLM's performance both with and without access to the source material.<n>We validate our approach using five strategically selected datasets: EPFL PhD manuscripts, a private collection of historical records, two sets of Wikipedia articles on related topics, and a synthetic baseline dataset.
arXiv Detail & Related papers (2025-02-19T13:03:06Z)
Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era [50.19334853510935]
Recent strides in instruction-based editing have enabled intuitive interaction with visual content, using natural language as a bridge between user intent and complex editing operations. We aim to democratize powerful visual editing across various industries, from entertainment to education.
arXiv Detail & Related papers (2024-11-15T05:18:15Z)
A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain [3.9519587827662397]
We focus on relation extraction and text classification, using the showcase of eight biomedical benchmarks. We consider trade-offs between accuracy and application costs, dive into training data generation through distant supervision and large language models such as ChatGPT, LLama, and Olmo, and discuss how to design final pipelines.
arXiv Detail & Related papers (2024-11-06T07:54:10Z)
Unlocking Comics: The AI4VA Dataset for Visual Understanding [62.345344799258804]
This paper presents a novel dataset comprising Franco-Belgian comics from the 1950s annotated for tasks including depth estimation, semantic segmentation, saliency detection, and character identification. It consists of two distinct and consistent styles and incorporates object concepts and labels taken from natural images. By including such diverse information across styles, this dataset not only holds promise for computational creativity but also offers avenues for the digitization of art and storytelling innovation.
arXiv Detail & Related papers (2024-10-27T14:27:05Z)
Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP [0.09208007322096533]
We explore the potential for interactively searching large-scale map collections using natural language inputs. As a case study, we adopt 562,842 images of maps publicly accessible via the Library of Congress's API. We present results for example searches created in consultation with staff in the Library of Congress's Geography and Map Division.
arXiv Detail & Related papers (2024-10-02T02:51:02Z)
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models [56.76307866160105]
We propose a contrastive learning framework, termed Document Object COntrastive learning (DoCo) DoCo leverages an auxiliary multimodal encoder to obtain the features of document objects and align them to the visual features generated by the vision encoder of Large Visual-Language Models (LVLMs) We demonstrate that the proposed DoCo serves as a plug-and-play pre-training method, which can be employed in the pre-training of various LVLMs without inducing any increase in computational complexity during the inference process.
arXiv Detail & Related papers (2024-02-29T10:17:27Z)
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs [96.54224331778195]
We present a text-grounding document understanding model, termed TGDoc, which enhances MLLMs with the ability to discern the spatial positioning of text within images. We formulate instruction tuning tasks including text detection, recognition, and spotting to facilitate the cohesive alignment between the visual encoder and large language model. Our method achieves state-of-the-art performance across multiple text-rich benchmarks, validating the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T06:46:37Z)
Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts [5.075506385456811]
This paper presents Curatr, an online platform for the exploration and curation of literature with machine learning-supported semantic search. The platform combines neural word embeddings with expert domain knowledge to enable the generation of thematic lexicons.
arXiv Detail & Related papers (2023-06-13T15:15:31Z)
Digital Editions as Distant Supervision for Layout Analysis of Printed Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models. In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics. We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z)
Handwriting Classification for the Analysis of Art-Historical Documents [6.918282834668529]
We focus on the analysis of handwriting in scanned documents from the art-historic archive of the WPI. We propose a handwriting classification model that labels extracted text fragments based on their visual structure.
arXiv Detail & Related papers (2020-11-04T13:06:46Z)
Enabling Language Models to Fill in the Blanks [81.59381915581892]
We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. We train (or fine-tune) off-the-shelf language models on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.
arXiv Detail & Related papers (2020-05-11T18:00:03Z)
Historical Document Processing: Historical Document Processing: A Survey of Techniques, Tools, and Trends [0.0]
Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from various subfields of computer science, including computer vision, document analysis and recognition, natural language processing, and machine learning.
arXiv Detail & Related papers (2020-02-15T01:54:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.