TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
Constitution
- URL: http://arxiv.org/abs/2402.01586v2
- Date: Sun, 18 Feb 2024 04:11:55 GMT
- Title: TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
Constitution
- Authors: Wenyue Hua, Xianjun Yang, Zelong Li, Wei Cheng, Yongfeng Zhang
- Abstract summary: This paper presents an Agent-Constitution-based agent framework, TrustAgent, an initial investigation into improving the safety of trustworthiness in LLM-based agents.
We demonstrate how pre-planning strategy injects safety knowledge to the model prior to plan generation, in-planning strategy bolsters safety during plan generation, and post-planning strategy ensures safety by post-planning inspection.
We explore the intricate relationships between safety and helpfulness, and between the model's reasoning ability and its efficacy as a safe agent.
- Score: 48.84353890821038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of LLM-based agents has garnered considerable attention, yet
their trustworthiness remains an under-explored area. As agents can directly
interact with the physical environment, their reliability and safety is
critical. This paper presents an Agent-Constitution-based agent framework,
TrustAgent, an initial investigation into improving the safety dimension of
trustworthiness in LLM-based agents. This framework consists of threefold
strategies: pre-planning strategy which injects safety knowledge to the model
prior to plan generation, in-planning strategy which bolsters safety during
plan generation, and post-planning strategy which ensures safety by
post-planning inspection. Through experimental analysis, we demonstrate how
these approaches can effectively elevate an LLM agent's safety by identifying
and preventing potential dangers. Furthermore, we explore the intricate
relationships between safety and helpfulness, and between the model's reasoning
ability and its efficacy as a safe agent. This paper underscores the imperative
of integrating safety awareness and trustworthiness into the design and
deployment of LLM-based agents, not only to enhance their performance but also
to ensure their responsible integration into human-centric environments. Data
and code are available at https://github.com/agiresearch/TrustAgent.
Related papers
- InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback [70.54226917774933]
This paper introduces InferAct, a novel approach that proactively detects potential errors before critical actions are executed.
InferAct is also capable of integrating human feedback to prevent irreversible risks and enhance the actor agent's decision-making process.
arXiv Detail & Related papers (2024-07-16T15:24:44Z) - Current state of LLM Risks and AI Guardrails [0.0]
Large language models (LLMs) have become increasingly sophisticated, leading to widespread deployment in sensitive applications where safety and reliability are paramount.
These risks necessitate the development of "guardrails" to align LLMs with desired behaviors and mitigate potential harm.
This work explores the risks associated with deploying LLMs and evaluates current approaches to implementing guardrails and model alignment techniques.
arXiv Detail & Related papers (2024-06-16T22:04:10Z) - GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
Existing methods for enhancing the safety of large language models (LLMs) are not directly transferable to LLM-powered agents.
We propose GuardAgent, the first LLM agent as a guardrail to other LLM agents.
GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines.
arXiv Detail & Related papers (2024-06-13T14:49:26Z) - Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based
Agents [50.034049716274005]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents.
We first formulate a general framework of agent backdoor attacks, then we present a thorough analysis on the different forms of agent backdoor attacks.
We propose the corresponding data poisoning mechanisms to implement the above variations of agent backdoor attacks on two typical agent tasks.
arXiv Detail & Related papers (2024-02-17T06:48:45Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - PsySafe: A Comprehensive Framework for Psychological-based Attack,
Defense, and Evaluation of Multi-agent System Safety [73.51336434996931]
Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence.
However, the potential misuse of this intelligence for malicious purposes presents significant risks.
We propose a framework (PsySafe) grounded in agent psychology, focusing on identifying how dark personality traits in agents can lead to risky behaviors.
Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors.
arXiv Detail & Related papers (2024-01-22T12:11:55Z) - Evil Geniuses: Delving into the Safety of LLM-based Agents [35.49857256840015]
Large language models (LLMs) have revitalized in large language models (LLMs)
This paper delves into the safety of LLM-based agents from three perspectives: agent quantity, role definition, and attack level.
arXiv Detail & Related papers (2023-11-20T15:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.