SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists
- URL: http://arxiv.org/abs/2506.13188v1
- Date: Mon, 16 Jun 2025 07:55:44 GMT
- Title: SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists
- Authors: Lynn Khellaf, Ipek Baris Schlicht, Tilman Mirass, Julia Bayer, Tilman Wagner, Ruben Bouwmeester,
- Abstract summary: SPOT, an open source natural language interface, makes OSM's rich, tag-based geographic data more accessible through intuitive scene descriptions.<n>It addresses real-world challenges such as hallucinations in model output, inconsistencies in OSM tagging, and the noisy nature of user input.<n>To our knowledge, SPOT is the first system to achieve reliable natural language access to OSM data at this level of accuracy.
- Score: 0.8796261172196742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: OpenStreetMap (OSM) is a vital resource for investigative journalists doing geolocation verification. However, existing tools to query OSM data such as Overpass Turbo require familiarity with complex query languages, creating barriers for non-technical users. We present SPOT, an open source natural language interface that makes OSM's rich, tag-based geographic data more accessible through intuitive scene descriptions. SPOT interprets user inputs as structured representations of geospatial object configurations using fine-tuned Large Language Models (LLMs), with results being displayed in an interactive map interface. While more general geospatial search tasks are conceivable, SPOT is specifically designed for use in investigative journalism, addressing real-world challenges such as hallucinations in model output, inconsistencies in OSM tagging, and the noisy nature of user input. It combines a novel synthetic data pipeline with a semantic bundling system to enable robust, accurate query generation. To our knowledge, SPOT is the first system to achieve reliable natural language access to OSM data at this level of accuracy. By lowering the technical barrier to geolocation verification, SPOT contributes a practical tool to the broader efforts to support fact-checking and combat disinformation.
Related papers
- OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence.<n>We propose a MLLM (OmniGeo) tailored to geospatial applications.<n>By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z) - Towards a Barrier-free GeoQA Portal: Natural Language Interaction with Geospatial Data Using Multi-Agent LLMs and Semantic Search [2.9658923973538034]
We propose a GeoQA Portal using a multi-agent Large Language Model framework for seamless natural language interaction with geospatial data.<n>Case studies, evaluations, and user tests confirm its effectiveness for non-experts, bridging GIS complexity and public access.
arXiv Detail & Related papers (2025-03-18T13:39:46Z) - Spot: A Natural Language Interface for Geospatial Searches in OSM [0.9320657506524149]
Spot is a user-friendly natural language interface for querying OpenStreetMap data.
It extracts relevant information from user-input sentences and displays candidate locations matching the descriptions on a map.
All code and generated data is available as an open-source repository.
arXiv Detail & Related papers (2023-11-14T11:35:09Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - GeoLM: Empowering Language Models for Geospatially Grounded Language
Understanding [45.36562604939258]
This paper introduces GeoLM, a language model that enhances the understanding of geo-entities in natural language.
We demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing.
arXiv Detail & Related papers (2023-10-23T01:20:01Z) - Navigation with Large Language Models: Semantic Guesswork as a Heuristic
for Planning [73.0990339667978]
Navigation in unfamiliar environments presents a major challenge for robots.
We use language models to bias exploration of novel real-world environments.
We evaluate LFG in challenging real-world environments and simulated benchmarks.
arXiv Detail & Related papers (2023-10-16T06:21:06Z) - GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models.
We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods.
Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z) - GeoGPT: Understanding and Processing Geospatial Tasks through An
Autonomous GPT [6.618846295332767]
Decision-makers in GIS need to combine a series of spatial algorithms and operations to solve geospatial tasks.
We develop a new framework called GeoGPT that can conduct geospatial data collection, processing, and analysis in an autonomous manner.
arXiv Detail & Related papers (2023-07-16T03:03:59Z) - MGeo: Multi-Modal Geographic Pre-Training Method [49.78466122982627]
We propose a novel query-POI matching method Multi-modal Geographic language model (MGeo)
MGeo represents GC as a new modality and is able to fully extract multi-modal correlations for accurate query-POI matching.
Our proposed multi-modal pre-training method can significantly improve the query-POI matching capability of generic PTMs.
arXiv Detail & Related papers (2023-01-11T03:05:12Z) - Explaining Patterns in Data with Language Models via Interpretable
Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data.
iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions.
Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z) - Towards Natural Language Question Answering over Earth Observation
Linked Data using Attention-based Neural Machine Translation [0.0]
This paper seeks to study and analyze the use of RNN-based neural machine translation with attention for transforming natural language questions into GeoSPARQL queries.
A dataset consisting of mappings from natural language questions to GeoSPARQL queries over the Corine Land Cover(CLC) Linked Data has been created to train and validate the deep neural network.
arXiv Detail & Related papers (2021-01-23T06:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.