The mapKurator System: A Complete Pipeline for Extracting and Linking
Text from Historical Maps
- URL: http://arxiv.org/abs/2306.17059v2
- Date: Mon, 3 Jul 2023 19:38:37 GMT
- Title: The mapKurator System: A Complete Pipeline for Extracting and Linking
Text from Historical Maps
- Authors: Jina Kim, Zekun Li, Yijun Lin, Min Namgung, Leeje Jang, Yao-Yi Chiang
- Abstract summary: mapKurator is an end-to-end system integrating machine learning models with a comprehensive data processing pipeline.
We deployed the mapKurator system and enabled the processing of over 60,000 maps and over 100 million text/place names in the David Rumsey Historical Map collection.
- Score: 7.209761597734092
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Scanned historical maps in libraries and archives are valuable repositories
of geographic data that often do not exist elsewhere. Despite the potential of
machine learning tools like the Google Vision APIs for automatically
transcribing text from these maps into machine-readable formats, they do not
work well with large-sized images (e.g., high-resolution scanned documents),
cannot infer the relation between the recognized text and other datasets, and
are challenging to integrate with post-processing tools. This paper introduces
the mapKurator system, an end-to-end system integrating machine learning models
with a comprehensive data processing pipeline. mapKurator empowers automated
extraction, post-processing, and linkage of text labels from large numbers of
large-dimension historical map scans. The output data, comprising bounding
polygons and recognized text, is in the standard GeoJSON format, making it
easily modifiable within Geographic Information Systems (GIS). The proposed
system allows users to quickly generate valuable data from large numbers of
historical maps for in-depth analysis of the map content and, in turn,
encourages map findability, accessibility, interoperability, and reusability
(FAIR principles). We deployed the mapKurator system and enabled the processing
of over 60,000 maps and over 100 million text/place names in the David Rumsey
Historical Map collection. We also demonstrated a seamless integration of
mapKurator with a collaborative web platform to enable accessing automated
approaches for extracting and linking text labels from historical map scans and
collective work to improve the results.
Related papers
- Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP [0.09208007322096533]
We explore the potential for interactively searching large-scale map collections using natural language inputs.
As a case study, we adopt 562,842 images of maps publicly accessible via the Library of Congress's API.
We present results for example searches created in consultation with staff in the Library of Congress's Geography and Map Division.
arXiv Detail & Related papers (2024-10-02T02:51:02Z) - CartoMark: a benchmark dataset for map pattern recognition and 1 map
content retrieval with machine intelligence [9.652629004863364]
We develop a large-scale benchmark dataset for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring.
These well-labelled datasets would facilitate the state-of-the-art machine intelligence technologies to conduct map feature detection, map pattern recognition and map content retrieval.
arXiv Detail & Related papers (2023-12-14T01:54:38Z) - AutoGeoLabel: Automated Label Generation for Geospatial Machine Learning [69.47585818994959]
We evaluate a big data processing pipeline to auto-generate labels for remote sensing data.
We utilize the big geo-data platform IBM PAIRS to dynamically generate such labels in dense urban areas.
arXiv Detail & Related papers (2022-01-31T20:02:22Z) - Synthetic Map Generation to Provide Unlimited Training Data for
Historical Map Text Detection [5.872532529455414]
We propose a method to automatically generate an unlimited amount of annotated historical map images for training text detection models.
We show that the state-of-the-art text detection models can benefit from the synthetic historical maps.
arXiv Detail & Related papers (2021-12-12T00:27:03Z) - An Automatic Approach for Generating Rich, Linked Geo-Metadata from
Historical Map Images [6.962949867017594]
This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images.
We have implemented the approach in a system called mapKurator.
arXiv Detail & Related papers (2021-12-03T01:44:38Z) - MapReader: A Computer Vision Pipeline for the Semantic Exploration of
Maps at Scale [1.5894241142512051]
We present MapReader, a free, open-source software library written in Python for analyzing large map collections (scanned or born-digital)
MapReader allows users with little or no computer vision expertise to retrieve maps via web-servers.
We show how the outputs from the MapReader pipeline can be linked to other, external datasets.
arXiv Detail & Related papers (2021-11-30T17:37:01Z) - HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps [81.86923212296863]
HD maps are maps with precise definitions of road lanes with rich semantics of the traffic rules.
There are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack.
We propose HDMapGen, a hierarchical graph generation model capable of producing high-quality and diverse HD maps.
arXiv Detail & Related papers (2021-06-28T17:59:30Z) - Scaling Systematic Literature Reviews with Machine Learning Pipelines [57.82662094602138]
Systematic reviews entail the extraction of data from scientific documents.
We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs.
We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation.
arXiv Detail & Related papers (2020-10-09T16:19:42Z) - OpenStreetMap: Challenges and Opportunities in Machine Learning and
Remote Sensing [66.23463054467653]
We present a review of recent methods based on machine learning to improve and use OpenStreetMap data.
We believe that OSM can change the way we interpret remote sensing data and that the synergy with machine learning can scale participatory map making.
arXiv Detail & Related papers (2020-07-13T09:58:14Z) - Voxel Map for Visual SLAM [57.07800982410967]
We propose a voxel-map representation to efficiently map points for visual SLAM.
Our method is geometrically guaranteed to fall in the camera field-of-view, and occluded points can be identified and removed to a certain extend.
Experimental results show that our voxel map representation is as efficient as a map with 5s and provides significantly higher localization accuracy (average 46% improvement in RMSE) on the EuRoC dataset.
arXiv Detail & Related papers (2020-03-04T18:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.