Related papers: Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems

Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems

URL: http://arxiv.org/abs/2506.03901v2
Date: Thu, 05 Jun 2025 06:44:23 GMT
Title: Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems
Authors: Yuxin Zhang, Yan Wang, Yongrui Chen, Shenyu Zhang, Xinbang Dai, Sheng Bi, Guilin Qi,
Abstract summary: Existing benchmarks fail to emulate the complex and heterogeneous noise distributions encountered in real-world retrieval environments.<n>We introduce Magic Mushroom, a benchmark for replicating "magic mushroom" noise.<n>Magic Mushroom emerges as a promising tool for evaluating and advancing noise-robust RAG systems.
Score: 16.058785648585605
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external retrieved information, mitigating issues such as hallucination and outdated knowledge. However, RAG systems are highly sensitive to retrieval noise prevalent in real-world scenarios. Existing benchmarks fail to emulate the complex and heterogeneous noise distributions encountered in real-world retrieval environments, undermining reliable robustness assessment. In this paper, we define four categories of retrieval noise based on linguistic properties and noise characteristics, aiming to reflect the heterogeneity of noise in real-world scenarios. Building on this, we introduce Magic Mushroom, a benchmark for replicating "magic mushroom" noise: contexts that appear relevant on the surface but covertly mislead RAG systems. Magic Mushroom comprises 7,468 single-hop and 3,925 multi-hop question-answer pairs. More importantly, Magic Mushroom enables researchers to flexibly configure combinations of retrieval noise according to specific research objectives or application scenarios, allowing for highly controlled evaluation setups. We evaluate LLM generators of varying parameter scales and classic RAG denoising strategies under diverse noise distributions to investigate their performance dynamics during progressive noise encroachment. Our analysis reveals that both generators and denoising strategies have significant room for improvement and exhibit extreme sensitivity to noise distributions. Magic Mushroom emerges as a promising tool for evaluating and advancing noise-robust RAG systems, accelerating their widespread deployment in real-world applications. The Magic Mushroom benchmark is available at https://drive.google.com/file/d/1aP5kyPuk4L-L_uoI6T9UhxuTyt8oMqjT/view?usp=sharing.

Related papers

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios [76.02688769599686]
We propose a novel multi-view clustering framework for the automatic identification and rectification of noisy data, termed AIRMVC.<n>Specifically, we reformulate noisy identification as an anomaly identification problem using GMM.<n>We then design a hybrid rectification strategy to mitigate the adverse effects of noisy data based on the identification results.
arXiv Detail & Related papers (2025-05-27T16:16:54Z)
Enhance Vision-Language Alignment with Noise [59.2608298578913]
We investigate whether the frozen model can be fine-tuned by customized noise.<n>We propose Positive-incentive Noise (PiNI) which can fine-tune CLIP via injecting noise into both visual and text encoders.
arXiv Detail & Related papers (2024-12-14T12:58:15Z)
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models [25.044751882839886]
Retrieval-Augmented Generation (RAG) has emerged as a crucial method for addressing hallucinations in large language models (LLMs)<n>In this paper, we define seven distinct noise types from a linguistic perspective and establish a Noise RAG Benchmark (NoiserBench)<n>Our analysis offers insights for developing more robust, adaptable RAG solutions and mitigating hallucinations across diverse retrieval scenarios.
arXiv Detail & Related papers (2024-08-24T09:23:01Z)
Robust Learning under Hybrid Noise [24.36707245704713]
We propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery.
arXiv Detail & Related papers (2024-07-04T16:13:25Z)
Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training [39.21885486667879]
Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. We propose a novel RAG approach known as Retrieval-augmented Adaptive Adrial Training (RAAT)
arXiv Detail & Related papers (2024-05-31T16:24:53Z)
Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought [0.0]
We study how noise in chain of thought impacts task performance in highly-controlled setting. We define two types of noise: textitstatic noise, a local form of noise which is applied after the CoT trace is computed, and textitdynamic noise, a global form of noise which propagates errors in the trace as it is computed. We find fine-tuned models are extremely robust to high levels of static noise but struggle significantly more with lower levels of dynamic noise.
arXiv Detail & Related papers (2024-02-06T13:59:56Z)
Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC) NPC consists of a detection module and a correction module. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z)
An Investigation of Noise in Morphological Inflection [21.411766936034]
We investigate the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion. We compare the effect of different types of noise on multiple state-of-the-art inflection models. We propose a novel character-level masked language modeling (CMLM) pretraining objective and explore its impact on the models' resistance to noise.
arXiv Detail & Related papers (2023-05-26T02:14:34Z)
Inference and Denoise: Causal Inference-based Neural Speech Enhancement [83.4641575757706]
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention. The proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement modules (EMs) to perform noise-conditional SE.
arXiv Detail & Related papers (2022-11-02T15:03:50Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
Learning based signal detection for MIMO systems with unknown noise statistics [84.02122699723536]
This paper aims to devise a generalized maximum likelihood (ML) estimator to robustly detect signals with unknown noise statistics. In practice, there is little or even no statistical knowledge on the system noise, which in many cases is non-Gaussian, impulsive and not analyzable. Our framework is driven by an unsupervised learning approach, where only the noise samples are required.
arXiv Detail & Related papers (2021-01-21T04:48:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.