Related papers: InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

URL: http://arxiv.org/abs/2407.11843v2
Date: Thu, 17 Oct 2024 11:26:10 GMT
Title: InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback
Authors: Haishuo Fang, Xiaodan Zhu, Iryna Gurevych,
Abstract summary: This paper introduces InferAct, a novel approach to proactively detect potential errors before risky actions are executed. InferAct acts as a human proxy, detecting unsafe actions and alerting users for intervention. Experiments on three widely-used tasks demonstrate the effectiveness of InferAct.
Score: 70.54226917774933
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A crucial requirement for deploying LLM-based agents in real-life applications is the robustness against risky or even irreversible mistakes. However, the existing research lacks a focus on preemptive evaluation of reasoning trajectories performed by LLM agents, leading to a gap in ensuring safe and reliable operations. To explore better solutions, this paper introduces InferAct, a novel approach that leverages the belief reasoning ability of LLMs, grounded in Theory-of-Mind, to proactively detect potential errors before risky actions are executed (e.g., `buy-now' in automatic online trading or web shopping). InferAct acts as a human proxy, detecting unsafe actions and alerting users for intervention, which helps prevent irreversible risks in time and enhances the actor agent's decision-making process. Experiments on three widely-used tasks demonstrate the effectiveness of InferAct, presenting a novel solution for safely developing LLM agents in environments involving critical decision-making.

Related papers

PredictaBoard: Benchmarking LLM Score Predictability [50.47497036981544]
Large Language Models (LLMs) often fail unpredictably. This poses a significant challenge to ensuring their safe deployment. We present PredictaBoard, a novel collaborative benchmarking framework.
arXiv Detail & Related papers (2025-02-20T10:52:38Z)
PSSD: Making Large Language Models Self-denial via Human Psyche Structure [5.057375783924452]
We present PSSD, which refers to and implements the human psyche structure such that three distinct and interconnected roles contribute to human reasoning. Extensive experiments demonstrate that the proposed design not only better enhance reasoning capabilities, but also seamlessly integrate with current models.
arXiv Detail & Related papers (2025-02-03T13:37:21Z)
Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling [9.305763502526833]
We propose an accountability model for task-oriented dialogue agents to address user overreliance via friction turns. Our empirical findings demonstrate that the proposed approach not only enables reliable estimation of AI agent errors but also guides the decoder in generating more accurate actions.
arXiv Detail & Related papers (2025-01-17T17:40:12Z)
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents. We identify six key features of LLM-based agents, based on which we summarize the current research progress. We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z)
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [80.45174785447136]
Laboratory accidents pose significant risks to human life and property. Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices. There is a growing concern about large language models (LLMs) for guidance in various fields.
arXiv Detail & Related papers (2024-10-18T05:21:05Z)
Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z)
Current state of LLM Risks and AI Guardrails [0.0]
Large language models (LLMs) have become increasingly sophisticated, leading to widespread deployment in sensitive applications where safety and reliability are paramount. These risks necessitate the development of "guardrails" to align LLMs with desired behaviors and mitigate potential harm. This work explores the risks associated with deploying LLMs and evaluates current approaches to implementing guardrails and model alignment techniques.
arXiv Detail & Related papers (2024-06-16T22:04:10Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models. It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners [10.746821861109176]
Large Language Models (LLMs) have witnessed remarkable performance as zero-shot task planners for robotic tasks. However, the open-loop nature of previous works makes LLM-based planning error-prone and fragile. In this work, we introduce a framework for closed-loop LLM-based planning called KnowLoop, backed by an uncertainty-based MLLMs failure detector.
arXiv Detail & Related papers (2024-06-01T12:52:06Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs [0.0]
This paper explores risks emanating from downstream uses of large language models (LLMs) We introduce a novel LLM-based risk assessment engine (GUARD-D-LLM) designed to pinpoint and rank threats relevant to specific use cases derived from text-based user inputs. Integrating thirty intelligent agents, this innovative approach identifies bespoke risks, gauges their severity, offers targeted suggestions for mitigation, and facilitates risk-aware development.
arXiv Detail & Related papers (2024-04-02T05:25:17Z)
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety. This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z)
TrustAgent: Towards Safe and Trustworthy LLM-based Agents [50.33549510615024]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a focus on improving the LLM-based agent safety. The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection.
arXiv Detail & Related papers (2024-02-02T17:26:23Z)
SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents [7.33319373357049]
This paper introduces SMARLA, a black-box safety monitoring approach specifically designed for Deep Reinforcement Learning (DRL) agents. SMARLA utilizes machine learning to predict safety violations by observing the agent's behavior during execution. Empirical results reveal that SMARLA is accurate at predicting safety violations, with a low false positive rate, and can predict violations at an early stage, approximately halfway through the execution of the agent, before violations occur.
arXiv Detail & Related papers (2023-08-03T21:08:51Z)
Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics [57.671493865825255]
We propose to model the impact of actions on-the-fly using latent embeddings. By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy. We show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space.
arXiv Detail & Related papers (2023-04-24T17:35:47Z)
Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z)
How RL Agents Behave When Their Actions Are Modified [0.0]
Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy.
arXiv Detail & Related papers (2021-02-15T18:10:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.