Related papers: REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark

REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark

URL: http://arxiv.org/abs/2502.12342v1
Date: Mon, 17 Feb 2025 22:10:47 GMT
Title: REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
Authors: Navve Wasserman, Roi Pony, Oshri Naparstek, Adi Raz Goldfarb, Eli Schwartz, Udi Barzelay, Leonid Karlinsky,
Abstract summary: We introduce REAL-MM-RAG, an automatically generated benchmark designed to address four key properties essential for real-world retrieval.<n>We propose a multi-difficulty-level scheme based on query rephrasing to evaluate models' semantic understanding beyond keyword matching.<n>Our benchmark reveals significant model weaknesses, particularly in handling table-heavy documents and robustness to query rephrasing.
Score: 16.55516587540082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate multi-modal document retrieval is crucial for Retrieval-Augmented Generation (RAG), yet existing benchmarks do not fully capture real-world challenges with their current design. We introduce REAL-MM-RAG, an automatically generated benchmark designed to address four key properties essential for real-world retrieval: (i) multi-modal documents, (ii) enhanced difficulty, (iii) Realistic-RAG queries and (iv) accurate labeling. Additionally, we propose a multi-difficulty-level scheme based on query rephrasing to evaluate models' semantic understanding beyond keyword matching. Our benchmark reveals significant model weaknesses, particularly in handling table-heavy documents and robustness to query rephrasing. To mitigate these shortcomings, we curate a rephrased training set and introduce a new finance-focused, table-heavy dataset. Fine-tuning on these datasets enables models to achieve state-of-the-art retrieval performance on REAL-MM-RAG benchmark. Our work offers a better way to evaluate and improve retrieval in multi-modal RAG systems while also providing training data and models that address current limitations.

Related papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs [11.861763118322136]
We introduce mmRAG, a modular benchmark for evaluating multi-modal RAG systems.<n>Our benchmark integrates queries from six diverse question-answering datasets spanning text, tables, and knowledge graphs.<n>We follow standard information retrieval procedures to annotate document relevance and derive dataset relevance.
arXiv Detail & Related papers (2025-05-16T12:31:29Z)
A Multi-Granularity Retrieval Framework for Visually-Rich Documents [4.804551482123172]
We propose a unified multi-granularity multimodal retrieval framework tailored for two benchmark tasks: MMDocIR and M2KR.<n>Our approach integrates hierarchical encoding strategies, modality-aware retrieval mechanisms, and vision-language model (VLM)-based candidate filtering.<n>Our framework demonstrates robust performance without the need for task-specific fine-tuning.
arXiv Detail & Related papers (2025-05-01T02:40:30Z)
MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework [15.410873298893817]
We propose Multi-Modal Knowledge-Based Retrieval-Augmented Generation (MMKB-RAG) This framework leverages the inherent knowledge boundaries of models to dynamically generate semantic tags for the retrieval process. Extensive experiments on knowledge-based visual question-answering tasks demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2025-04-14T10:19:47Z)
MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts [56.30364248231053]
This paper introduces Multi-Modal Retrieval-Augmented Generation (M2RAG) M2RAG is a benchmark designed to evaluate the effectiveness of Multi-modal Large Language Models (MLLMs) To enhance the context utilization capabilities of MLLMs, we also introduce Multi-Modal Retrieval-Augmented Instruction Tuning (MM-RAIT)
arXiv Detail & Related papers (2025-02-24T16:25:25Z)
Cross-Format Retrieval-Augmented Generation in XR with LLMs for Context-Aware Maintenance Assistance [6.16808916207942]
This paper presents a detailed evaluation of a Retrieval-Augmented Generation system that integrates large language models (LLMs) We assess the performance of eight LLMs, emphasizing key metrics such as response speed and accuracy, which were quantified using BLEU and METEOR scores. The results validate the system's ability to deliver timely and accurate responses, highlighting the potential of RAG frameworks to optimize maintenance operations.
arXiv Detail & Related papers (2025-02-21T17:19:39Z)
MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation [34.66546005629471]
Large Language Models (LLMs) are essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information.<n>Retrieval-Augmented Generation (RAG) addresses this issue by incorporating external, real-time information retrieval to ground LLM responses.<n>To tackle this problem, we propose Multi-Agent Filtering Retrieval-Augmented Generation (MAIN-RAG)<n>MAIN-RAG is a training-free RAG framework that leverages multiple LLM agents to collaboratively filter and score retrieved documents.
arXiv Detail & Related papers (2024-12-31T08:07:26Z)
Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs [23.357843519762483]
Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation framework, combined with Knowledge Graphs, robustly enhances the reasoning capabilities of Large language models.<n>We introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities.<n>Our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in stationary environments.
arXiv Detail & Related papers (2024-12-10T15:56:03Z)
Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels [64.94853276821992]
Large multimodal models (LMMs) are increasingly deployed across diverse applications.<n>Traditional evaluation methods are largely dataset-centric, relying on fixed, labeled datasets and supervised metrics.<n>We explore unsupervised model ranking for LMMs by leveraging their uncertainty signals, such as softmax probabilities.
arXiv Detail & Related papers (2024-12-09T13:05:43Z)
Self-adaptive Multimodal Retrieval-Augmented Generation [0.0]
We propose a new approach called Self-adaptive Multimodal Retrieval-Augmented Generation (SAM-RAG) SAM-RAG not only dynamically filters relevant documents based on the input query, including image captions when needed, but also verifies the quality of both the retrieved documents and the output. Extensive experimental results show that SAM-RAG surpasses existing state-of-the-art methods in both retrieval accuracy and response generation.
arXiv Detail & Related papers (2024-10-15T06:39:35Z)
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description. Existing works mainly focus on case-to-case retrieval using lengthy queries. Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs [13.638439488923671]
Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. This paper introduces Multi-Head RAG (MRAG), a novel scheme designed to address this gap with a simple yet powerful idea.
arXiv Detail & Related papers (2024-06-07T16:59:38Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation [51.99752147380505]
This paper presents a benchmark self-evolving framework to dynamically evaluate Large Language Models (LLMs) We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence. Our framework widens performance discrepancies both between different models and within the same model across various tasks.
arXiv Detail & Related papers (2024-02-18T03:40:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.