GeoGPT-RAG Technical Report
- URL: http://arxiv.org/abs/2509.09686v2
- Date: Mon, 15 Sep 2025 01:00:13 GMT
- Title: GeoGPT-RAG Technical Report
- Authors: Fei Huang, Fan Wu, Zeqing Zhang, Qihao Wang, Long Zhang, Grant Michael Boquet, Hongyang Chen,
- Abstract summary: GeoGPT is an open large language model system built to advance research in the geosciences.<n>RAG augments model outputs with relevant information retrieved from an external knowledge source.<n>RAG uses RAG to draw from the GeoGPT Library, a specialized corpus curated for geoscientific content.
- Score: 48.23789135946953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: GeoGPT is an open large language model system built to advance research in the geosciences. To enhance its domain-specific capabilities, we integrated Retrieval Augmented Generation(RAG), which augments model outputs with relevant information retrieved from an external knowledge source. GeoGPT uses RAG to draw from the GeoGPT Library, a specialized corpus curated for geoscientific content, enabling it to generate accurate, context-specific answers. Users can also create personalized knowledge bases by uploading their own publication lists, allowing GeoGPT to retrieve and respond using user-provided materials. To further improve retrieval quality and domain alignment, we fine-tuned both the embedding model and a ranking model that scores retrieved passages by relevance to the query. These enhancements optimize RAG for geoscience applications and significantly improve the system's ability to deliver precise and trustworthy outputs. GeoGPT reflects a strong commitment to open science through its emphasis on collaboration, transparency, and community driven development. As part of this commitment, we have open-sourced two core RAG components-GeoEmbedding and GeoReranker-to support geoscientists, researchers, and professionals worldwide with powerful, accessible AI tools.
Related papers
- GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics [91.17301794848025]
This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions.<n>Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance on AI-generated chain-of-thought (CoT) data and training strategies.
arXiv Detail & Related papers (2026-02-13T04:48:05Z) - GeoEvolve: Automating Geospatial Model Discovery via Multi-Agent Large Language Models [49.257706111340134]
We introduce GeoEvolve, a multi-agent LLM framework that couples evolutionary search with geospatial domain knowledge.<n>We evaluate it on two fundamental and classical tasks: spatial (kriging) and spatial uncertainty.<n>It reduces spatial error (RMSE) by 13-21% and enhances uncertainty estimation performance by 17%.
arXiv Detail & Related papers (2025-09-25T21:03:57Z) - Geo-Semantic-Parsing: AI-powered geoparsing by traversing semantic knowledge graphs [0.7422344184734279]
We introduce a novel geoparsing and geotagging technique called Geo-Semantic-Parsing (GSP)<n>GSP identifies location references in free text and extracts the corresponding geographic coordinates.<n>We evaluate GSP on a well-known reference dataset including almost 10k event-related tweets.
arXiv Detail & Related papers (2025-03-03T10:30:23Z) - Geo-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models [0.5242869847419834]
This study introduces a framework to construct such a knowledge base, leveraging geospatial script semantics.
An example knowledge base, Geo-FuB, built from 154,075 Google Earth Engine scripts, is available on GitHub.
arXiv Detail & Related papers (2024-10-28T12:50:27Z) - Geometric Feature Enhanced Knowledge Graph Embedding and Spatial Reasoning [8.561588656662419]
Geospatial Knowledge Graphs (GeoKGs) model geoentities and spatial relationships in an interconnected manner.
Existing methods for mining and reasoning from GeoKGs, such as popular knowledge graph embedding (KGE) techniques, lack geographic awareness.
This study aims to enhance general-purpose KGE by developing new strategies and integrating geometric features of spatial relations.
arXiv Detail & Related papers (2024-10-24T00:53:48Z) - GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks [1.7687829461198472]
This paper presents and open-sources the GeoCode-PT and GeoCode-SFT corpora, along with the GeoCode-Eval evaluation dataset.
By leveraging QRA and LoRA for pretraining and fine-tuning, we introduce GeoCode-GPT-7B, the first LLM focused on geospatial code generation.
Experimental results show that GeoCode-GPT outperforms other models in multiple-choice accuracy by 9.1% to 32.1%, in code summarization ability by 5.4% to 21.7%, and in code generation capability by 1.2% to 25.1%.
arXiv Detail & Related papers (2024-10-22T13:57:55Z) - GeoGalactica: A Scientific Large Language Model in Geoscience [95.15911521220052]
Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP)
We specialize an LLM into geoscience, by further pre-training the model with a vast amount of texts in geoscience, as well as supervised fine-tuning (SFT) the resulting model with our custom collected instruction tuning dataset.
We train GeoGalactica over a geoscience-related text corpus containing 65 billion tokens, preserving as the largest geoscience-specific text corpus.
Then we fine-tune the model with 1 million pairs of instruction-tuning
arXiv Detail & Related papers (2023-12-31T09:22:54Z) - GeoGPT: Understanding and Processing Geospatial Tasks through An
Autonomous GPT [6.618846295332767]
Decision-makers in GIS need to combine a series of spatial algorithms and operations to solve geospatial tasks.
We develop a new framework called GeoGPT that can conduct geospatial data collection, processing, and analysis in an autonomous manner.
arXiv Detail & Related papers (2023-07-16T03:03:59Z) - K2: A Foundation Language Model for Geoscience Knowledge Understanding
and Utilization [105.89544876731942]
Large language models (LLMs) have achieved great success in general domains of natural language processing.
We present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience.
arXiv Detail & Related papers (2023-06-08T09:29:05Z) - GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark [56.08664336835741]
We propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE.
We collect data from open-released geographic resources and introduce six natural language understanding tasks.
We pro vide evaluation experiments and analysis of general baselines, indicating the effectiveness and significance of the GeoGLUE benchmark.
arXiv Detail & Related papers (2023-05-11T03:21:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.