Related papers: GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models

GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models

URL: http://arxiv.org/abs/2412.05587v2
Date: Wed, 11 Dec 2024 13:56:40 GMT
Title: GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models
Authors: Shuyang Hou, Jianyuan Liang, Anqi Zhao, Huayi Wu,
Abstract summary: We propose a framework for building a operator knowledge base tailored to the Google Earth Engine (GEE) JavaScript API.<n>This framework consists of an operator syntax knowledge table, an operator relationship frequency table, an operator frequent pattern knowledge table, and an operator relationship chain knowledge table.<n>We show that the framework achieves over 90% accuracy, recall, and F1 score in operator knowledge extraction.
Score: 0.562479170374811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the scale and complexity of spatiotemporal data continue to grow rapidly, the use of geospatial modeling on the Google Earth Engine (GEE) platform presents dual challenges: improving the coding efficiency of domain experts and enhancing the coding capabilities of interdisciplinary users. To address these challenges and improve the performance of large language models (LLMs) in geospatial code generation tasks, we propose a framework for building a geospatial operator knowledge base tailored to the GEE JavaScript API. This framework consists of an operator syntax knowledge table, an operator relationship frequency table, an operator frequent pattern knowledge table, and an operator relationship chain knowledge table. By leveraging Abstract Syntax Tree (AST) techniques and frequent itemset mining, we systematically extract operator knowledge from 185,236 real GEE scripts and syntax documentation, forming a structured knowledge base. Experimental results demonstrate that the framework achieves over 90% accuracy, recall, and F1 score in operator knowledge extraction. When integrated with the Retrieval-Augmented Generation (RAG) strategy for LLM-based geospatial code generation tasks, the knowledge base improves performance by 20-30%. Ablation studies further quantify the necessity of each knowledge table in the knowledge base construction. This work provides robust support for the advancement and application of geospatial code modeling techniques, offering an innovative approach to constructing domain-specific knowledge bases that enhance the code generation capabilities of LLMs, and fostering the deeper integration of generative AI technologies within the field of geoinformatics.

Related papers

Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation [52.8352968531863]
Large language models (LLMs) have made significant progress in general-purpose natural language processing tasks. This paper presents a novel framework that combines knowledge graph (KG) and retrieval-augmented generation (RAG) techniques to enhance LLM performance in the telecom domain.
arXiv Detail & Related papers (2025-03-31T15:58:08Z)
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence. We propose a MLLM (OmniGeo) tailored to geospatial applications. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z)
GeoEdit: Geometric Knowledge Editing for Large Language Models [52.37408324849593]
Regular updates are essential for maintaining up-to-date knowledge in large language models (LLMs) We propose a novel framework called Geometric Knowledge Editing (GeoEdit) GeoEdit distinguishes between neurons associated with new knowledge updates and those related to general knowledge perturbations. For the remaining neurons, we integrate both old and new knowledge for aligned directions and apply a "forget-then-learn" editing strategy for opposite directions.
arXiv Detail & Related papers (2025-02-27T10:27:48Z)
Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation [2.6026969939746705]
This paper proposes a Chain of Programming framework to decompose the code generation process into five steps. The framework incorporates a shared information pool, knowledge base retrieval, and user feedback mechanisms. It significantly improves the logical clarity, syntactical correctness, and executability of the generated code.
arXiv Detail & Related papers (2024-11-16T09:20:35Z)
Geo-FuB: A Method for Constructing an Operator-Function Knowledge Base for Geospatial Code Generation Tasks Using Large Language Models [0.5242869847419834]
This study introduces a framework to construct such a knowledge base, leveraging geospatial script semantics. An example knowledge base, Geo-FuB, built from 154,075 Google Earth Engine scripts, is available on GitHub.
arXiv Detail & Related papers (2024-10-28T12:50:27Z)
An LLM Agent for Automatic Geospatial Data Analysis [5.842462214442362]
Large language models (LLMs) are being used in data science code generation tasks. Their application to geospatial data processing is challenging due to difficulties in incorporating complex data structures and spatial constraints. We introduce GeoAgent, a new interactive framework designed to help LLMs handle geospatial data processing more effectively.
arXiv Detail & Related papers (2024-10-24T14:47:25Z)
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure. Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph [1.7418328181959968]
The proposed research aims to develop an innovative semantic query processing system. It enables users to obtain comprehensive information about research works produced by Computer Science (CS) researchers at the Australian National University.
arXiv Detail & Related papers (2024-05-24T09:19:45Z)
EVOR: Evolving Retrieval for Code Generation [17.46870626157077]
Existing pipelines for retrieval-augmented code generation employ static knowledge bases with a single source.<n>We develop a novel pipeline, EVOR, that employs the synchronous evolution of both queries and diverse knowledge bases.
arXiv Detail & Related papers (2024-02-19T17:37:28Z)
GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT [6.618846295332767]
Decision-makers in GIS need to combine a series of spatial algorithms and operations to solve geospatial tasks. We develop a new framework called GeoGPT that can conduct geospatial data collection, processing, and analysis in an autonomous manner.
arXiv Detail & Related papers (2023-07-16T03:03:59Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.