Related papers: MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation

MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation

URL: http://arxiv.org/abs/2601.15487v1
Date: Wed, 21 Jan 2026 21:39:09 GMT
Title: MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation
Authors: Chandan Kumar Sahu, Premith Kumar Chilukuri, Matthew Hetrich,
Abstract summary: Existing datasets often rely on general-domain corpora or purely textual retrieval.<n>We introduce MiRAGE, a Multiagent framework for RAG systems Evaluation.<n>MiRAGE orchestrates a swarm of specialized agents to generate verified, domain-specific, multimodal, and multi-hop Question-Answer datasets.
Score: 0.3499870393443268
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid evolution of Retrieval-Augmented Generation (RAG) toward multimodal, high-stakes enterprise applications has outpaced the development of domain specific evaluation benchmarks. Existing datasets often rely on general-domain corpora or purely textual retrieval, failing to capture the complexity of specialized technical documents where information is inextricably multimodal and reasoning requires synthesizing disjoint evidence. We address this gap by introducing MiRAGE, a Multiagent framework for RAG systems Evaluation, that leverages a collaborative swarm of specialized agents to generate verified, domain-specific, multimodal, and multi-hop Question-Answer datasets. MiRAGE orchestrates a swarm of specialized agents: a recursive context optimization loop to aggregate scattered evidence, an adversarial verifier agent to guarantee factual grounding, and an agent to recognize the expert persona and the relevant domain to mimic expert cognitive workflows. Extensive empirical evaluation across four distinct domains (regulations, finance, quantitative biology, and journalism) demonstrates that MiRAGE generates datasets with significantly higher reasoning complexity (>2.3 average hops) and factual faithfulness. Our ablation studies point that MiRAGE can be powered by LLMs if textual descriptions of the images are available. Visual grounding still remains a frontier. By automating the creation of gold standard evaluation datasets that reflect the latent thematic structure of proprietary corpora, MiRAGE provides the necessary infrastructure to rigorously benchmark the next generation information retrieval systems.

Related papers

MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues [15.237473092649843]
MME-RAG is a framework that decomposes entity recognition into two coordinated stages: type-level judgment by lightweight managers and span-level extraction by specialized experts.<n>Each expert is supported by a KeyInfo retriever that injects semantically aligned, few-shot exemplars during inference, enabling precise and domain-adaptive extraction without additional training.<n> Experiments on CrossNER, MIT-Movie, MIT-Restaurant, and our newly constructed multi-domain customer-service dataset demonstrate that MME-RAG performs better than recent baselines in most domains.
arXiv Detail & Related papers (2025-11-15T13:35:55Z)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation [75.66731090275645]
We introduce MiRAGE, an evaluation framework for retrieval-augmented generation (RAG) from multimodal sources.<n>MiRAGE is a claim-centric approach to multimodal RAG evaluation, consisting of InfoF1, evaluating factuality and information coverage, and CiteF1, measuring citation support and completeness.
arXiv Detail & Related papers (2025-10-28T18:21:19Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering [50.42577862494645]
We present RAG-IGBench, a benchmark designed to evaluate the task of Interleaved Generation based on Retrieval-Augmented Generation (RAG-IG) in open-domain question answering.<n>RAG-IG integrates multimodal large language models (MLLMs) with retrieval mechanisms, enabling the models to access external image-text information for generating coherent multimodal content.
arXiv Detail & Related papers (2025-10-11T03:06:39Z)
MSRS: Evaluating Multi-Source Retrieval-Augmented Generation [51.717139132190574]
Many real-world applications demand the ability to integrate and summarize information scattered across multiple sources.<n>We present a scalable framework for constructing evaluation benchmarks that challenge RAG systems to integrate information across distinct sources.
arXiv Detail & Related papers (2025-08-28T14:59:55Z)
Deep Research Agents: A Systematic Examination And Roadmap [109.53237992384872]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z)
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [11.53083922927901]
HM-RAG is a novel Hierarchical Multi-agent Multimodal RAG framework.<n>It pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data.
arXiv Detail & Related papers (2025-04-13T06:55:33Z)
Composed Multi-modal Retrieval: A Survey of Approaches and Applications [81.54640206021757]
Composed Multi-modal Retrieval (CMR) emerges as a pivotal next-generation technology.<n>CMR enables users to query images or videos by integrating a reference visual input with textual modifications.<n>This paper provides a comprehensive survey of CMR, covering its fundamental challenges, technical advancements, and applications.
arXiv Detail & Related papers (2025-03-03T09:18:43Z)
Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering [0.0]
In the chemical and process industries, Process Flow Diagrams (PFDs) and Piping and Instrumentation Diagrams (P&IDs) are critical for design, construction, and maintenance. Recent advancements in Generative AI have shown promise in understanding and interpreting process diagrams for Visual Question Answering (VQA) We propose a secure, on-premises enterprise solution using a hierarchical, multi-agent Retrieval Augmented Generation (RAG) framework.
arXiv Detail & Related papers (2024-08-24T19:34:04Z)
A Causal-based Framework for Multimodal Multivariate Time Series Validation Enhanced by Unsupervised Deep Learning as an Enabler for Industry 4.0 [0.0]
A conceptual validation framework for multi-level contextual anomaly detection is developed. A Long Short-Term Memory Autoencoder is successfully evaluated to validate the learnt representation of contexts associated to multiple assets of a blast furnace. A research roadmap is identified to combine causal discovery and representation learning as an enabler for unsupervised Root Cause Analysis applied to the process industry.
arXiv Detail & Related papers (2020-08-05T14:48:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.