A Trembling House of Cards? Mapping Adversarial Attacks against Language
Agents
- URL: http://arxiv.org/abs/2402.10196v1
- Date: Thu, 15 Feb 2024 18:51:32 GMT
- Title: A Trembling House of Cards? Mapping Adversarial Attacks against Language
Agents
- Authors: Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun
- Abstract summary: We present the first systematic effort in mapping adversarial attacks against language agents.
We propose 12 potential attack scenarios against different components of an agent, covering different attack strategies.
We emphasize the urgency to gain a thorough understanding of language agent risks before their widespread deployment.
- Score: 37.978142062138986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language agents powered by large language models (LLMs) have seen exploding
development. Their capability of using language as a vehicle for thought and
communication lends an incredible level of flexibility and versatility. People
have quickly capitalized on this capability to connect LLMs to a wide range of
external components and environments: databases, tools, the Internet, robotic
embodiment, etc. Many believe an unprecedentedly powerful automation technology
is emerging. However, new automation technologies come with new safety risks,
especially for intricate systems like language agents. There is a surprisingly
large gap between the speed and scale of their development and deployment and
our understanding of their safety risks. Are we building a house of cards? In
this position paper, we present the first systematic effort in mapping
adversarial attacks against language agents. We first present a unified
conceptual framework for agents with three major components: Perception, Brain,
and Action. Under this framework, we present a comprehensive discussion and
propose 12 potential attack scenarios against different components of an agent,
covering different attack strategies (e.g., input manipulation, adversarial
demonstrations, jailbreaking, backdoors). We also draw connections to
successful attack strategies previously applied to LLMs. We emphasize the
urgency to gain a thorough understanding of language agent risks before their
widespread deployment.
Related papers
- Red-Teaming LLM Multi-Agent Systems via Communication Attacks [10.872328358364776]
Large Language Model-based Multi-Agent Systems (LLM-MAS) have revolutionized complex problem-solving capability by enabling sophisticated agent collaboration through message-based communications.
We introduce Agent-in-the-Middle (AiTM), a novel attack that exploits the fundamental communication mechanisms in LLM-MAS by intercepting and manipulating inter-agent messages.
arXiv Detail & Related papers (2025-02-20T18:55:39Z) - Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)
In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.
We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z) - A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations [4.332547891216645]
We adapt the general taxonomy for classifying machine learning attacks on one of the subdivisions - training-time white-box backdoor attacks.
We also consider the corresponding defense methods against backdoor attacks.
arXiv Detail & Related papers (2025-02-06T04:43:05Z) - MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue [35.7801861576917]
Large Language Models (LLMs) demonstrate outstanding performance in their reservoir of knowledge and understanding capabilities.
LLMs have been shown to be prone to illegal or unethical reactions when subjected to jailbreak attacks.
We propose a novel multi-round dialogue jailbreaking agent, emphasizing the importance of stealthiness in identifying and mitigating potential threats to human values.
arXiv Detail & Related papers (2024-11-06T10:32:09Z) - Jailbreaking and Mitigation of Vulnerabilities in Large Language Models [4.564507064383306]
Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation.
Despite these advancements, LLMs have shown considerable vulnerabilities, particularly to prompt injection and jailbreaking attacks.
This review analyzes the state of research on these vulnerabilities and presents available defense strategies.
arXiv Detail & Related papers (2024-10-20T00:00:56Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - Survey of Vulnerabilities in Large Language Models Revealed by
Adversarial Attacks [5.860289498416911]
Large Language Models (LLMs) are swiftly advancing in architecture and capability.
As they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows.
This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs.
arXiv Detail & Related papers (2023-10-16T21:37:24Z) - Agents: An Open-source Framework for Autonomous Language Agents [98.91085725608917]
We consider language agents as a promising direction towards artificial general intelligence.
We release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience.
arXiv Detail & Related papers (2023-09-14T17:18:25Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - Baseline Defenses for Adversarial Attacks Against Aligned Language
Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses.
We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training.
We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.