EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations
- URL: http://arxiv.org/abs/2512.01335v1
- Date: Mon, 01 Dec 2025 06:53:49 GMT
- Title: EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations
- Authors: Xinyun Zhou, Xinfeng Li, Yinan Peng, Ming Xu, Xuanwang Zhang, Miao Yu, Yidong Wang, Xiaojun Jia, Kun Wang, Qingsong Wen, XiaoFeng Wang, Wei Dong,
- Abstract summary: Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI.<n>Our study unveils a critical, overlooked vulnerability: their susceptibility to subtle symbolic perturbations.<n>We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts.
- Score: 57.97838850473147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by incorporating external knowledge. However, our study unveils a critical, overlooked vulnerability: their profound susceptibility to subtle symbolic perturbations, particularly through near-imperceptible emoticon tokens such as "(@_@)" that can catastrophically mislead retrieval, termed EmoRAG. We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts that contain a matching emoticon. Our extensive experiment across general question-answering and code domains, using a range of state-of-the-art retrievers and generators, reveals three key findings: (I) Single-Emoticon Disaster: Minimal emoticon injections cause maximal disruptions, with a single emoticon almost 100% dominating RAG output. (II) Positional Sensitivity: Placing an emoticon at the beginning of a query can cause severe perturbation, with F1-Scores exceeding 0.92 across all datasets. (III) Parameter-Scale Vulnerability: Counterintuitively, models with larger parameters exhibit greater vulnerability to the interference. We provide an in-depth analysis to uncover the underlying mechanisms of these phenomena. Furthermore, we raise a critical concern regarding the robustness assumption of current RAG systems, envisioning a threat scenario where an adversary exploits this vulnerability to manipulate the RAG system. We evaluate standard defenses and find them insufficient against EmoRAG. To address this, we propose targeted defenses, analyzing their strengths and limitations in mitigating emoticon-based perturbations. Finally, we outline future directions for building robust RAG systems.
Related papers
- Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z) - Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models [38.25786549326184]
Emoticons are widely used in digital communication to convey affective intent, yet their safety implications for Large Language Models (LLMs) remain largely unexplored.<n>We identify emoticon semantic confusion, a vulnerability where LLMs misinterpret ASCII-based emoticons to perform unintended and even destructive actions.
arXiv Detail & Related papers (2026-01-12T05:34:18Z) - DREAM: Dynamic Red-teaming across Environments for AI Models [28.267208528754082]
We introduce DREAM, a framework for evaluation of Large Language Models (LLMs) against dynamic, multi-stage attacks.<n>At its core, DREAM uses a Cross-Environment Adrial Knowledge Graph (CE-AKG) to maintain stateful, cross-domain understanding of vulnerabilities.<n>Our evaluation of 12 leading LLM agents reveals a critical vulnerability: these attack chains succeed in over 70% of cases for most models.
arXiv Detail & Related papers (2025-12-22T04:11:57Z) - Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z) - MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks [47.46936341268548]
Retrieval-Augmented Generation (RAG) systems introduce a critical attack surface: corpus poisoning.<n>We propose MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments.<n>Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness.
arXiv Detail & Related papers (2025-12-09T06:38:16Z) - GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians [32.33432636089606]
Current benchmarks for AI clinician systems fail to capture the depth, robustness, and safety required for real-world clinical practice.<n>We introduce the GAPS framework, a multidimensional paradigm for evaluating textbfGrounding (cognitive depth), textbfAdequacy (answer completeness), textbfPerturbation (robustness), and textbfSafety.<n>We develop a fully automated, guideline-anchored pipeline to construct a GAPS-aligned benchmark end-to-end.
arXiv Detail & Related papers (2025-10-15T16:40:28Z) - FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models [124.02734355214325]
Vision-Language-Action (VLA) models are driving rapid progress in robotics.<n> adversarial images can "freeze" VLA models and cause them to ignore subsequent instructions.<n>FreezeVLA generates and evaluates action-freezing attacks via min-max bi-level optimization.
arXiv Detail & Related papers (2025-09-24T08:15:28Z) - The Emotional Baby Is Truly Deadly: Does your Multimodal Large Reasoning Model Have Emotional Flattery towards Humans? [10.208269928409138]
EmoAgent orchestrates exaggerated affective prompts to hijack reasoning pathways.<n>We identify persistent high-risk failure modes in transparent deep-thinking scenarios.<n>Experiments on advanced MLRMs demonstrate the effectiveness of EmoAgent.
arXiv Detail & Related papers (2025-08-06T00:39:28Z) - The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z) - RSFuzz: A Robustness-Guided Swarm Fuzzing Framework Based on Behavioral Constraints [19.659469020494022]
RSFuzz is a robustness-guided swarm fuzzing framework designed to detect logical vulnerabilities in multi-robot systems.<n>We construct two swarm fuzzing schemes, Single Attacker Fuzzing (SA-Fuzzing) and Multiple Attacker Fuzzing (MA-Fuzzing)<n>Results show RSFuzz outperforms the state-of-the-art with an average improvement of 17.75% in effectiveness and a 38.4% increase in efficiency.
arXiv Detail & Related papers (2024-09-07T06:46:23Z) - Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations [9.209974698634175]
Retrieval-Augmented Generation (RAG) is a promising solution for addressing the limitations of Large Language Models (LLMs)
In this work, we investigate two underexplored aspects when assessing the robustness of RAG.
We introduce a novel attack method, the Genetic Attack on RAG (textitGARAG), which targets these aspects.
arXiv Detail & Related papers (2024-04-22T07:49:36Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Towards robust sensing for Autonomous Vehicles: An adversarial
perspective [82.83630604517249]
It is of primary importance that the resulting decisions are robust to perturbations.
Adversarial perturbations are purposefully crafted alterations of the environment or of the sensory measurements.
A careful evaluation of the vulnerabilities of their sensing system(s) is necessary in order to build and deploy safer systems.
arXiv Detail & Related papers (2020-07-14T05:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.