Related papers: A Multimodal Conversational Assistant for the Characterization of Agricultural Plots from Geospatial Open Data

A Multimodal Conversational Assistant for the Characterization of Agricultural Plots from Geospatial Open Data

URL: http://arxiv.org/abs/2509.17544v2
Date: Tue, 23 Sep 2025 14:32:50 GMT
Title: A Multimodal Conversational Assistant for the Characterization of Agricultural Plots from Geospatial Open Data
Authors: Juan Cañada, Raúl Alonso, Julio Molleda, Fidel Díez,
Abstract summary: This study presents an open-source conversational assistant that integrates multimodal retrieval and large language models (LLMs)<n>The proposed architecture combines orthophotos, Sentinel-2 vegetation indices, and user-provided documents through retrieval-augmented generation (RAG)<n>Preliminary results show that the system is capable of generating clear, relevant, and context-aware responses to agricultural queries.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing availability of open Earth Observation (EO) and agricultural datasets holds great potential for supporting sustainable land management. However, their high technical entry barrier limits accessibility for non-expert users. This study presents an open-source conversational assistant that integrates multimodal retrieval and large language models (LLMs) to enable natural language interaction with heterogeneous agricultural and geospatial data. The proposed architecture combines orthophotos, Sentinel-2 vegetation indices, and user-provided documents through retrieval-augmented generation (RAG), allowing the system to flexibly determine whether to rely on multimodal evidence, textual knowledge, or both in formulating an answer. To assess response quality, we adopt an LLM-as-a-judge methodology using Qwen3-32B in a zero-shot, unsupervised setting, applying direct scoring in a multi-dimensional quantitative evaluation framework. Preliminary results show that the system is capable of generating clear, relevant, and context-aware responses to agricultural queries, while remaining reproducible and scalable across geographic regions. The primary contributions of this work include an architecture for fusing multimodal EO and textual knowledge sources, a demonstration of lowering the barrier to access specialized agricultural information through natural language interaction, and an open and reproducible design.

Related papers

AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents [17.904008870689964]
We present a Python execution environment, AgriWorld, exposing unified tools for queries over field parcels, remote-sensing time-series analytics, crop growth simulation, and task-specific predictors (e.g. yield, stress, and disease risk)<n>On top of this environment, we design a multi-turn AgroReflective agent, that iteratively writes code, observes execution results, and refines its analysis via an execute-observe-refine loop.
arXiv Detail & Related papers (2026-02-17T03:12:57Z)
Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study [0.7257685311746803]
AgriHubi is a domain-adapted retrieval-augmented generation system for Finnish-language agricultural decision support.<n>The system shows clear gains in answer completeness, linguistic accuracy, and perceived reliability.
arXiv Detail & Related papers (2026-02-02T15:15:24Z)
MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation [0.3499870393443268]
Existing datasets often rely on general-domain corpora or purely textual retrieval.<n>We introduce MiRAGE, a Multiagent framework for RAG systems Evaluation.<n>MiRAGE orchestrates a swarm of specialized agents to generate verified, domain-specific, multimodal, and multi-hop Question-Answer datasets.
arXiv Detail & Related papers (2026-01-21T21:39:09Z)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation [75.66731090275645]
We introduce MiRAGE, an evaluation framework for retrieval-augmented generation (RAG) from multimodal sources.<n>MiRAGE is a claim-centric approach to multimodal RAG evaluation, consisting of InfoF1, evaluating factuality and information coverage, and CiteF1, measuring citation support and completeness.
arXiv Detail & Related papers (2025-10-28T18:21:19Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
Optimizing Agricultural Research: A RAG-Based Approach to Mycorrhizal Fungi Information [1.2349443032034277]
Retrieval-Augmented Generation (RAG) represents a transformative approach within natural language processing (NLP)<n>We present the design and evaluation of a RAG-enabled system tailored for Mycophyto.<n>The framework underscores the potential of AI-driven knowledge discovery to accelerate agroecological innovation and enhance decision-making in sustainable farming systems.
arXiv Detail & Related papers (2025-09-16T20:21:55Z)
AgriGPT: a Large Language Model Ecosystem for Agriculture [16.497060004913806]
AgriGPT is a domain-specialized Large Language Models ecosystem for agriculture usage.<n>At its core, we design a scalable data engine that compiles credible data sources into Agri-342K, a high-quality, standardized question-answer dataset.<n>We employ Tri-RAG, a three-channel Retrieval-Augmented Generation framework combining dense retrieval, sparse retrieval, and multi-hop knowledge graph reasoning.
arXiv Detail & Related papers (2025-08-12T04:51:08Z)
Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain [1.0144032120138065]
This study generates multilingual (English, Hindi, Punjabi) synthetic datasets from agriculture-specific documents from India.<n> Evaluation on human-created datasets demonstrates significant improvements in factuality, relevance, and agricultural consensus.
arXiv Detail & Related papers (2025-07-22T19:25:10Z)
Multimodal Agricultural Agent Architecture (MA3): A New Paradigm for Intelligent Agricultural Decision-Making [32.62816270192696]
Modern agriculture faces dual challenges: optimizing production efficiency and achieving sustainable development.<n>To address these challenges, this study proposes an innovative textbfMultimodal textbfAgricultural textbfAgent textbfArchitecture (textbfMA3)<n>This study constructs a multimodal agricultural agent dataset encompassing five major tasks: classification, detection, Visual Question Answering (VQA), tool selection, and agent evaluation.
arXiv Detail & Related papers (2025-04-07T07:32:41Z)
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis [5.006697347461899]
We present the crop disease domain multimodal dataset, a pioneering resource designed to advance the field of agricultural research.<n>The dataset comprises 137,000 images of various crop diseases, accompanied by 1 million question-answer pairs that span a broad spectrum of agricultural knowledge.<n>We demonstrate the utility of the dataset by finetuning state-of-the-art multimodal models, showcasing significant improvements in crop disease diagnosis.
arXiv Detail & Related papers (2025-03-10T06:37:42Z)
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation. Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge. RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Chatmap : Large Language Model Interaction with Cartographic Data [0.0]
OpenStreetMap (OSM) is the most ambitious open-source global initiative offering detailed urban and rural geographic data. In this study, we demonstrate the proof of concept and details of the process of fine-tuning a relatively small scale (1B parameters) Large Language Models (LLMs) with a relatively small artificial dataset curated by a more capable teacher model. The study aims to provide an initial guideline for such generative artificial intelligence (AI) adaptations and demonstrate early signs of useful emerging abilities in this context.
arXiv Detail & Related papers (2023-09-28T15:32:36Z)
RHO ($\rho$): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding [57.46495388734495]
This paper presents RHO ($rho$) utilizing the representations of linked entities and relation predicates from a knowledge graph (KG) We propose (1) local knowledge grounding to combine textual embeddings with the corresponding KG embeddings; and (2) global knowledge grounding to equip RHO with multi-hop reasoning abilities via the attention mechanism.
arXiv Detail & Related papers (2022-12-03T10:36:34Z)
A General Purpose Neural Architecture for Geospatial Systems [142.43454584836812]
We present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias. We envision how such a model may facilitate cooperation between members of the community.
arXiv Detail & Related papers (2022-11-04T09:58:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.