Related papers: Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study

Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study

URL: http://arxiv.org/abs/2602.02208v1
Date: Mon, 02 Feb 2026 15:15:24 GMT
Title: Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study
Authors: Md. Toufique Hasan, Ayman Asad Khan, Mika Saari, Vaishnavi Bankhele, Pekka Abrahamsson,
Abstract summary: AgriHubi is a domain-adapted retrieval-augmented generation system for Finnish-language agricultural decision support.<n>The system shows clear gains in answer completeness, linguistic accuracy, and perceived reliability.
Score: 0.7257685311746803
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models show promise for knowledge-intensive domains, yet their use in agriculture is constrained by weak grounding, English-centric training data, and limited real-world evaluation. These issues are amplified for low-resource languages, where high-quality domain documentation exists but remains difficult to access through general-purpose models. This paper presents AgriHubi, a domain-adapted retrieval-augmented generation (RAG) system for Finnish-language agricultural decision support. AgriHubi integrates Finnish agricultural documents with open PORO family models and combines explicit source grounding with user feedback to support iterative refinement. Developed over eight iterations and evaluated through two user studies, the system shows clear gains in answer completeness, linguistic accuracy, and perceived reliability. The results also reveal practical trade-offs between response quality and latency when deploying larger models. This study provides empirical guidance for designing and evaluating domain-specific RAG systems in low-resource language settings.

Related papers

Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory [0.0]
Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier.<n>This paper presents a cost-efficient, cross-lingual Retrieval-Augmented Generation framework for Bengali agricultural advisory.
arXiv Detail & Related papers (2026-01-05T12:41:44Z)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation [75.66731090275645]
We introduce MiRAGE, an evaluation framework for retrieval-augmented generation (RAG) from multimodal sources.<n>MiRAGE is a claim-centric approach to multimodal RAG evaluation, consisting of InfoF1, evaluating factuality and information coverage, and CiteF1, measuring citation support and completeness.
arXiv Detail & Related papers (2025-10-28T18:21:19Z)
A Multimodal Conversational Assistant for the Characterization of Agricultural Plots from Geospatial Open Data [0.0]
This study presents an open-source conversational assistant that integrates multimodal retrieval and large language models (LLMs)<n>The proposed architecture combines orthophotos, Sentinel-2 vegetation indices, and user-provided documents through retrieval-augmented generation (RAG)<n>Preliminary results show that the system is capable of generating clear, relevant, and context-aware responses to agricultural queries.
arXiv Detail & Related papers (2025-09-22T09:02:53Z)
Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain [1.0144032120138065]
This study generates multilingual (English, Hindi, Punjabi) synthetic datasets from agriculture-specific documents from India.<n> Evaluation on human-created datasets demonstrates significant improvements in factuality, relevance, and agricultural consensus.
arXiv Detail & Related papers (2025-07-22T19:25:10Z)
Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG [51.120170062795566]
We propose Divide-Then-Align (DTA) to endow RAG systems with the ability to respond with "I don't know" when the query is out of the knowledge boundary.<n>DTA balances accuracy with appropriate abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.
arXiv Detail & Related papers (2025-05-27T08:21:21Z)
Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust [0.0]
This work focuses on the development of a multilingual non-profit IR system for the Islamic domain. By employing methods like continued pre-training for domain adaptation and language reduction to decrease model size, a lightweight multilingual retrieval model was prepared.
arXiv Detail & Related papers (2024-11-09T11:37:18Z)
Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts [48.89527378273811]
This paper explores integrating Automatic Speech Recognition into natural language query systems for Korean meteorologists.<n>We address challenges in developing ASR systems for the Korean weather domain, specifically specialized vocabulary and Korean linguistic intricacies.
arXiv Detail & Related papers (2024-10-24T05:40:07Z)
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts. Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models. The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z)
R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models [51.468732121824125]
Large language models have achieved remarkable success on general NLP tasks, but they may fall short for domain-specific problems. Existing evaluation tools only provide a few baselines and evaluate them on various domains without mining the depth of domain knowledge. In this paper, we address the challenges of evaluating RALLMs by introducing the R-Eval toolkit, a Python toolkit designed to streamline the evaluation of different RAGs.
arXiv Detail & Related papers (2024-06-17T15:59:49Z)
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation. Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge. RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z)
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing [0.2302001830524133]
This survey paper addresses the absence of a comprehensive overview on Retrieval-Augmented Language Models (RALMs)<n>The paper discusses the essential components of RALMs, including Retrievers, Language Models, and Augmentations.<n>RALMs demonstrate utility in a spectrum of tasks, from translation and dialogue systems to knowledge-intensive applications.
arXiv Detail & Related papers (2024-04-30T13:14:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.