A Trembling House of Cards? Mapping Adversarial Attacks against Language
Agents
- URL: http://arxiv.org/abs/2402.10196v1
- Date: Thu, 15 Feb 2024 18:51:32 GMT
- Title: A Trembling House of Cards? Mapping Adversarial Attacks against Language
Agents
- Authors: Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun
- Abstract summary: We present the first systematic effort in mapping adversarial attacks against language agents.
We propose 12 potential attack scenarios against different components of an agent, covering different attack strategies.
We emphasize the urgency to gain a thorough understanding of language agent risks before their widespread deployment.
- Score: 37.978142062138986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language agents powered by large language models (LLMs) have seen exploding
development. Their capability of using language as a vehicle for thought and
communication lends an incredible level of flexibility and versatility. People
have quickly capitalized on this capability to connect LLMs to a wide range of
external components and environments: databases, tools, the Internet, robotic
embodiment, etc. Many believe an unprecedentedly powerful automation technology
is emerging. However, new automation technologies come with new safety risks,
especially for intricate systems like language agents. There is a surprisingly
large gap between the speed and scale of their development and deployment and
our understanding of their safety risks. Are we building a house of cards? In
this position paper, we present the first systematic effort in mapping
adversarial attacks against language agents. We first present a unified
conceptual framework for agents with three major components: Perception, Brain,
and Action. Under this framework, we present a comprehensive discussion and
propose 12 potential attack scenarios against different components of an agent,
covering different attack strategies (e.g., input manipulation, adversarial
demonstrations, jailbreaking, backdoors). We also draw connections to
successful attack strategies previously applied to LLMs. We emphasize the
urgency to gain a thorough understanding of language agent risks before their
widespread deployment.
Related papers
- SoK: A Systems Perspective on Compound AI Threats and Countermeasures [3.458371054070399]
We discuss different software and hardware attacks applicable to compound AI systems.
We show how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack.
arXiv Detail & Related papers (2024-11-20T17:08:38Z) - MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue [36.44365630876591]
Large Language Models (LLMs) demonstrate outstanding performance in their reservoir of knowledge and understanding capabilities.
LLMs have been shown to be prone to illegal or unethical reactions when subjected to jailbreak attacks.
We propose a novel multi-round dialogue jailbreaking agent, emphasizing the importance of stealthiness in identifying and mitigating potential threats to human values.
arXiv Detail & Related papers (2024-11-06T10:32:09Z) - Jailbreaking and Mitigation of Vulnerabilities in Large Language Models [4.564507064383306]
Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation.
Despite these advancements, LLMs have shown considerable vulnerabilities, particularly to prompt injection and jailbreaking attacks.
This review analyzes the state of research on these vulnerabilities and presents available defense strategies.
arXiv Detail & Related papers (2024-10-20T00:00:56Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems [27.316115171846953]
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied AI.
LLMs are fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications.
This fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems.
arXiv Detail & Related papers (2024-05-27T17:59:43Z) - Survey of Vulnerabilities in Large Language Models Revealed by
Adversarial Attacks [5.860289498416911]
Large Language Models (LLMs) are swiftly advancing in architecture and capability.
As they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows.
This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs.
arXiv Detail & Related papers (2023-10-16T21:37:24Z) - Agents: An Open-source Framework for Autonomous Language Agents [98.91085725608917]
We consider language agents as a promising direction towards artificial general intelligence.
We release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience.
arXiv Detail & Related papers (2023-09-14T17:18:25Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - Baseline Defenses for Adversarial Attacks Against Aligned Language
Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses.
We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training.
We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z) - Certifiably Robust Policy Learning against Adversarial Communication in
Multi-agent Systems [51.6210785955659]
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions.
However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored.
In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $CfracN-12$ agents to a victim agent.
arXiv Detail & Related papers (2022-06-21T07:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.