InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback
- URL: http://arxiv.org/abs/2407.11843v1
- Date: Tue, 16 Jul 2024 15:24:44 GMT
- Title: InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback
- Authors: Haishuo Fang, Xiaodan Zhu, Iryna Gurevych,
- Abstract summary: This paper introduces InferAct, a novel approach that proactively detects potential errors before critical actions are executed.
InferAct is also capable of integrating human feedback to prevent irreversible risks and enhance the actor agent's decision-making process.
- Score: 70.54226917774933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A crucial requirement for deploying LLM-based agents in real-life applications is robustness against risky or irreversible mistakes. However, existing research lacks a focus on the preemptive evaluation of reasoning trajectories performed by LLM agents, leading to a gap in ensuring safe and reliable operations. To explore better solutions, this paper introduces InferAct, a novel approach that leverages the Theory-of-Mind capability of LLMs to proactively detect potential errors before critical actions are executed (e.g., "buy-now" in automatic online trading or web shopping). InferAct is also capable of integrating human feedback to prevent irreversible risks and enhance the actor agent's decision-making process. Experiments on three widely used tasks demonstrate the effectiveness of InferAct. The proposed solution presents a novel approach and concrete contributions toward developing LLM agents that can be safely deployed in different environments involving critical decision-making.
Related papers
- Current state of LLM Risks and AI Guardrails [0.0]
Large language models (LLMs) have become increasingly sophisticated, leading to widespread deployment in sensitive applications where safety and reliability are paramount.
These risks necessitate the development of "guardrails" to align LLMs with desired behaviors and mitigate potential harm.
This work explores the risks associated with deploying LLMs and evaluates current approaches to implementing guardrails and model alignment techniques.
arXiv Detail & Related papers (2024-06-16T22:04:10Z) - Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners [10.746821861109176]
Large Language Models (LLMs) have witnessed remarkable performance as zero-shot task planners for robotic tasks.
However, the open-loop nature of previous works makes LLM-based planning error-prone and fragile.
In this work, we introduce a framework for closed-loop LLM-based planning called KnowLoop, backed by an uncertainty-based MLLMs failure detector.
arXiv Detail & Related papers (2024-06-01T12:52:06Z) - Controlling Large Language Model Agents with Entropic Activation Steering [20.56909601159833]
We study how large language models (LLMs) form and act on beliefs by conducting experiments in controlled sequential decision-making tasks.
We show that LLM agents are overconfident: They draw strong conclusions about what to do based on insufficient evidence, resulting in inadequately explorative behavior.
We introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents.
arXiv Detail & Related papers (2024-06-01T00:25:00Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Optimization-based Prompt Injection Attack to LLM-as-a-Judge [78.20257854455562]
We introduce JudgeDeceiver, a novel optimization-based prompt injection attack tailored to LLM-as-a-Judge.
Our method formulates a precise optimization objective for attacking the decision-making process of LLM-as-a-Judge.
Compared to handcraft prompt injection attacks, our method demonstrates superior efficacy.
arXiv Detail & Related papers (2024-03-26T13:58:00Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
Constitution [48.84353890821038]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, an initial investigation into improving the safety of trustworthiness in LLM-based agents.
We demonstrate how pre-planning strategy injects safety knowledge to the model prior to plan generation, in-planning strategy bolsters safety during plan generation, and post-planning strategy ensures safety by post-planning inspection.
We explore the intricate relationships between safety and helpfulness, and between the model's reasoning ability and its efficacy as a safe agent.
arXiv Detail & Related papers (2024-02-02T17:26:23Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications.
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.