Related papers: Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

URL: http://arxiv.org/abs/2407.07791v2
Date: Tue, 23 Jul 2024 01:59:54 GMT
Title: Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
Authors: Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu,
Abstract summary: We investigate the security implications of large language models (LLMs) in multi-agent systems. We propose a novel two-stage attack method involving Persuasiveness Injection and Manipulated Knowledge Injection. We demonstrate that our attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge.
Score: 28.244283407749265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper, we investigate this critical issue by constructing a detailed threat model and a comprehensive simulation environment that mirrors real-world multi-agent deployments in a trusted platform. Subsequently, we propose a novel two-stage attack method involving Persuasiveness Injection and Manipulated Knowledge Injection to systematically explore the potential for manipulated knowledge (i.e., counterfactual and toxic knowledge) spread without explicit prompt manipulation. Our method leverages the inherent vulnerabilities of LLMs in handling world knowledge, which can be exploited by attackers to unconsciously spread fabricated information. Through extensive experiments, we demonstrate that our attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during agent communication. Furthermore, we show that these manipulations can persist through popular retrieval-augmented generation frameworks, where several benign agents store and retrieve manipulated chat histories for future interactions. This persistence indicates that even after the interaction has ended, the benign agents may continue to be influenced by manipulated knowledge. Our findings reveal significant security risks in LLM-based multi-agent systems, emphasizing the imperative need for robust defenses against manipulated knowledge spread, such as introducing ``guardian'' agents and advanced fact-checking tools.

Related papers

LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions [80.12078194093013]
We present the first comprehensive survey of hallucinations in LLM-based agents.<n>We propose a new taxonomy that identifies different types of agent hallucinations occurring at different stages.<n>We conduct an in-depth examination of eighteen triggering causes underlying the emergence of agent hallucinations.
arXiv Detail & Related papers (2025-09-23T13:24:48Z)
Can an Individual Manipulate the Collective Decisions of Multi-Agents? [53.01767232004823]
M-Spoiler is a framework that simulates agent interactions within a multi-agent system to generate adversarial samples.<n>M-Spoiler introduces a stubborn agent that actively aids in optimizing adversarial samples.<n>Our findings confirm the risks posed by the knowledge of an individual agent in multi-agent systems.
arXiv Detail & Related papers (2025-09-20T01:54:20Z)
Risk Analysis Techniques for Governed LLM-based Multi-Agent Systems [0.0]
This report addresses the early stages of risk identification and analysis for multi-agent AI systems.<n>We examine six critical failure modes: cascading reliability failures, inter-agent communication failures, monoculture collapse, conformity bias, deficient theory of mind, and mixed motive dynamics.
arXiv Detail & Related papers (2025-08-06T06:06:57Z)
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover [0.18472148461613155]
Large Language Model (LLM) agents and multi-agent systems introduce unprecedented security vulnerabilities.<n>This paper presents a comprehensive evaluation of the security of LLMs used as reasoning engines within autonomous agents.<n>We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers.
arXiv Detail & Related papers (2025-07-09T13:54:58Z)
SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG) RAG enhances MLLMs by grounding responses in query-relevant external knowledge. This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks. We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z)
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation [4.241100280846233]
AI agents, powered by large language models (LLMs), have transformed human-computer interactions by enabling seamless, natural, and context-aware communication. This paper investigates a critical vulnerability: adversarial attacks targeting the LLM core within AI agents.
arXiv Detail & Related papers (2024-12-05T18:38:30Z)
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents. We identify six key features of LLM-based agents, based on which we summarize the current research progress. We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems [6.480532634073257]
We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents. This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption. To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread.
arXiv Detail & Related papers (2024-10-09T11:01:29Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification [35.16099878559559]
Large language models (LLMs) have experienced significant development and are being deployed in real-world applications. We introduce a new type of attack that causes malfunctions by misleading the agent into executing repetitive or irrelevant actions. Our experiments reveal that these attacks can induce failure rates exceeding 80% in multiple scenarios.
arXiv Detail & Related papers (2024-07-30T14:35:31Z)
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs [4.927763944523323]
The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP) This paper will synthesize the findings from each vulnerability section and propose new directions of research and development. By understanding the focal points of current vulnerabilities, we can better anticipate and mitigate future risks.
arXiv Detail & Related papers (2024-07-30T04:08:00Z)
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents [54.09074527006576]
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges. This inadequacy primarily stems from the lack of built-in action knowledge in language agents. We introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge.
arXiv Detail & Related papers (2024-03-05T16:39:12Z)
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI) Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z)
Exploring the Adversarial Capabilities of Large Language Models [25.7847594292453]
Large language models (LLMs) can craft adversarial examples out of benign samples to fool existing safe rails. Our experiments, which focus on hate speech detection, reveal that LLMs succeed in finding adversarial perturbations, effectively undermining hate speech detection systems.
arXiv Detail & Related papers (2024-02-14T12:28:38Z)
On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation. Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.