Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP
- URL: http://arxiv.org/abs/2410.01190v1
- Date: Wed, 02 Oct 2024 02:51:02 GMT
- Title: Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP
- Authors: Jamie Mahowald, Benjamin Charles Germain Lee,
- Abstract summary: We explore the potential for interactively searching large-scale map collections using natural language inputs.
As a case study, we adopt 562,842 images of maps publicly accessible via the Library of Congress's API.
We present results for example searches created in consultation with staff in the Library of Congress's Geography and Map Division.
- Score: 0.09208007322096533
- License:
- Abstract: Despite the prevalence and historical importance of maps in digital collections, current methods of navigating and exploring map collections are largely restricted to catalog records and structured metadata. In this paper, we explore the potential for interactively searching large-scale map collections using natural language inputs ("maps with sea monsters"), visual inputs (i.e., reverse image search), and multimodal inputs (an example map + "more grayscale"). As a case study, we adopt 562,842 images of maps publicly accessible via the Library of Congress's API. To accomplish this, we use the mulitmodal Contrastive Language-Image Pre-training (CLIP) machine learning model to generate embeddings for these maps, and we develop code to implement exploratory search capabilities with these input strategies. We present results for example searches created in consultation with staff in the Library of Congress's Geography and Map Division and describe the strengths, weaknesses, and possibilities for these search queries. Moreover, we introduce a fine-tuning dataset of 10,504 map-caption pairs, along with an architecture for fine-tuning a CLIP model on this dataset. To facilitate re-use, we provide all of our code in documented, interactive Jupyter notebooks and place all code into the public domain. Lastly, we discuss the opportunities and challenges for applying these approaches across both digitized and born-digital collections held by galleries, libraries, archives, and museums.
Related papers
- VecCity: A Taxonomy-guided Library for Map Entity Representation Learning [48.73446321300362]
Map entity representation learning (MapRL) generates versatile and reusable data representations.
We propose a novel taxonomy for MapRL that organizes models based on functional module-such as encoders, pre-training tasks, and downstream tasks.
We present a taxonomy-driven library, VecCity, which offers easy-to-use interfaces for encoding, pre-training, fine-tuning, and evaluation.
arXiv Detail & Related papers (2024-10-31T07:03:46Z) - Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - CartoMark: a benchmark dataset for map pattern recognition and 1 map
content retrieval with machine intelligence [9.652629004863364]
We develop a large-scale benchmark dataset for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring.
These well-labelled datasets would facilitate the state-of-the-art machine intelligence technologies to conduct map feature detection, map pattern recognition and map content retrieval.
arXiv Detail & Related papers (2023-12-14T01:54:38Z) - The mapKurator System: A Complete Pipeline for Extracting and Linking
Text from Historical Maps [7.209761597734092]
mapKurator is an end-to-end system integrating machine learning models with a comprehensive data processing pipeline.
We deployed the mapKurator system and enabled the processing of over 60,000 maps and over 100 million text/place names in the David Rumsey Historical Map collection.
arXiv Detail & Related papers (2023-06-29T16:05:40Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Synthetic Map Generation to Provide Unlimited Training Data for
Historical Map Text Detection [5.872532529455414]
We propose a method to automatically generate an unlimited amount of annotated historical map images for training text detection models.
We show that the state-of-the-art text detection models can benefit from the synthetic historical maps.
arXiv Detail & Related papers (2021-12-12T00:27:03Z) - An Automatic Approach for Generating Rich, Linked Geo-Metadata from
Historical Map Images [6.962949867017594]
This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images.
We have implemented the approach in a system called mapKurator.
arXiv Detail & Related papers (2021-12-03T01:44:38Z) - MapReader: A Computer Vision Pipeline for the Semantic Exploration of
Maps at Scale [1.5894241142512051]
We present MapReader, a free, open-source software library written in Python for analyzing large map collections (scanned or born-digital)
MapReader allows users with little or no computer vision expertise to retrieve maps via web-servers.
We show how the outputs from the MapReader pipeline can be linked to other, external datasets.
arXiv Detail & Related papers (2021-11-30T17:37:01Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.