PsySafe: A Comprehensive Framework for Psychological-based Attack,
Defense, and Evaluation of Multi-agent System Safety
- URL: http://arxiv.org/abs/2401.11880v2
- Date: Sun, 18 Feb 2024 02:36:39 GMT
- Title: PsySafe: A Comprehensive Framework for Psychological-based Attack,
Defense, and Evaluation of Multi-agent System Safety
- Authors: Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang,
Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao
- Abstract summary: Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence.
However, the potential misuse of this intelligence for malicious purposes presents significant risks.
We propose a framework (PsySafe) grounded in agent psychology, focusing on identifying how dark personality traits in agents can lead to risky behaviors.
Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors.
- Score: 73.51336434996931
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit
profound capabilities in collective intelligence. However, the potential misuse
of this intelligence for malicious purposes presents significant risks. To
date, comprehensive research on the safety issues associated with multi-agent
systems remains limited. In this paper, we explore these concerns through the
innovative lens of agent psychology, revealing that the dark psychological
states of agents constitute a significant threat to safety. To tackle these
concerns, we propose a comprehensive framework (PsySafe) grounded in agent
psychology, focusing on three key areas: firstly, identifying how dark
personality traits in agents can lead to risky behaviors; secondly, evaluating
the safety of multi-agent systems from the psychological and behavioral
perspectives, and thirdly, devising effective strategies to mitigate these
risks. Our experiments reveal several intriguing phenomena, such as the
collective dangerous behaviors among agents, agents' self-reflection when
engaging in dangerous behavior, and the correlation between agents'
psychological assessments and dangerous behaviors. We anticipate that our
framework and observations will provide valuable insights for further research
into the safety of multi-agent systems. We will make our data and code publicly
accessible at https://github.com/AI4Good24/PsySafe.
Related papers
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities [28.244283407749265]
We investigate the security implications of large language models (LLMs) in multi-agent systems.
We propose a novel two-stage attack method involving Persuasiveness Injection and Manipulated Knowledge Injection.
We demonstrate that our attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge.
arXiv Detail & Related papers (2024-07-10T16:08:46Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
Constitution [48.84353890821038]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, an initial investigation into improving the safety of trustworthiness in LLM-based agents.
We demonstrate how pre-planning strategy injects safety knowledge to the model prior to plan generation, in-planning strategy bolsters safety during plan generation, and post-planning strategy ensures safety by post-planning inspection.
We explore the intricate relationships between safety and helpfulness, and between the model's reasoning ability and its efficacy as a safe agent.
arXiv Detail & Related papers (2024-02-02T17:26:23Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - Evil Geniuses: Delving into the Safety of LLM-based Agents [35.49857256840015]
Large language models (LLMs) have revitalized in large language models (LLMs)
This paper delves into the safety of LLM-based agents from three perspectives: agent quantity, role definition, and attack level.
arXiv Detail & Related papers (2023-11-20T15:50:09Z) - Testing Language Model Agents Safely in the Wild [19.507292491433738]
We propose a framework for conducting safe autonomous agent tests on the open internet.
Agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary.
Using an adversarial simulated agent, we measure its ability to identify and stop unsafe situations.
arXiv Detail & Related papers (2023-11-17T14:06:05Z) - Responsible Emergent Multi-Agent Behavior [2.9370710299422607]
State of the art in Responsible AI has ignored one crucial point: human problems are multi-agent problems.
From driving in traffic to negotiating economic policy, human problem-solving involves interaction and the interplay of the actions and motives of multiple individuals.
This dissertation develops the study of responsible emergent multi-agent behavior.
arXiv Detail & Related papers (2023-11-02T21:37:32Z) - On the Security Risks of Knowledge Graph Reasoning [71.64027889145261]
We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors.
We present ROAR, a new class of attacks that instantiate a variety of such threats.
We explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries.
arXiv Detail & Related papers (2023-05-03T18:47:42Z) - On Assessing The Safety of Reinforcement Learning algorithms Using
Formal Methods [6.2822673562306655]
Safety mechanisms such as adversarial training, adversarial detection, and robust learning are not always adapted to all disturbances in which the agent is deployed.
It is therefore necessary to propose new solutions adapted to the learning challenges faced by the agent.
We use reward shaping and a modified Q-learning algorithm as defense mechanisms to improve the agent's policy when facing adversarial perturbations.
arXiv Detail & Related papers (2021-11-08T23:08:34Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.