MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2509.24253v1
- Date: Mon, 29 Sep 2025 03:55:28 GMT
- Title: MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation
- Authors: Yuelyu Ji,
- Abstract summary: We propose MRAG-Suite, a diagnostic evaluation platform integrating diverse multimodal benchmarks.<n>We introduce difficulty-based and ambiguity-aware filtering strategies, alongside MM-RAGChecker, a claim-level diagnostic tool.
- Score: 1.9036571490366498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Retrieval-Augmented Generation (Visual RAG) significantly advances question answering by integrating visual and textual evidence. Yet, current evaluations fail to systematically account for query difficulty and ambiguity. We propose MRAG-Suite, a diagnostic evaluation platform integrating diverse multimodal benchmarks (WebQA, Chart-RAG, Visual-RAG, MRAG-Bench). We introduce difficulty-based and ambiguity-aware filtering strategies, alongside MM-RAGChecker, a claim-level diagnostic tool. Our results demonstrate substantial accuracy reductions under difficult and ambiguous queries, highlighting prevalent hallucinations. MM-RAGChecker effectively diagnoses these issues, guiding future improvements in Visual RAG systems.
Related papers
- RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering [3.5512614044131596]
We propose RAG-X, a diagnostic framework that evaluates the retriever and generator independently across a triad of QA tasks.<n>By surfacing hidden failure modes, RAG-X offers the diagnostic transparency needed for safe and verifiable clinical RAG systems.
arXiv Detail & Related papers (2026-03-03T21:55:42Z) - RAGExplorer: A Visual Analytics System for the Comparative Diagnosis of RAG Systems [12.726326169727733]
RAGExplorer is a visual analytics system for the systematic comparison and diagnosis of RAG configurations.<n>We demonstrate the effectiveness of RAGExplorer through detailed case studies and user studies.
arXiv Detail & Related papers (2026-01-19T12:09:56Z) - VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation [64.82775032985485]
Visual retrieval-augmented generation (VRAG) augments vision-language models (VLMs) with external visual knowledge to ground reasoning and reduce hallucinations.<n>Yet current VRAG systems often fail to reliably perceive and integrate evidence across multiple images, leading to weak grounding and erroneous conclusions.<n>We propose EVisRAG, an end-to-end framework that learns to reason with evidence-guided multi-image to address this issue.
arXiv Detail & Related papers (2025-10-10T13:34:23Z) - MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot [47.77948063906033]
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records.<n>This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain.<n>Tests show MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates.
arXiv Detail & Related papers (2025-02-06T12:27:35Z) - Unanswerability Evaluation for Retrieval Augmented Generation [74.3022365715597]
UAEval4RAG is a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively.<n>We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries.
arXiv Detail & Related papers (2024-12-16T19:11:55Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation [61.14660526363607]
We propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules.
RAGChecker has significantly better correlations with human judgments than other evaluation metrics.
The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems.
arXiv Detail & Related papers (2024-08-15T10:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.