"Trust me on this" Explaining Agent Behavior to a Human Terminator
- URL: http://arxiv.org/abs/2504.04592v2
- Date: Mon, 05 May 2025 17:48:40 GMT
- Title: "Trust me on this" Explaining Agent Behavior to a Human Terminator
- Authors: Uri Menkes, Assaf Hallak, Ofra Amir,
- Abstract summary: We propose an explainability scheme to help optimize the number of human interventions.<n>In this paper, we formalize this setup and propose an explainability scheme to help optimize the number of human interventions.
- Score: 7.7559527224629266
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Consider a setting where a pre-trained agent is operating in an environment and a human operator can decide to temporarily terminate its operation and take-over for some duration of time. These kind of scenarios are common in human-machine interactions, for example in autonomous driving, factory automation and healthcare. In these settings, we typically observe a trade-off between two extreme cases -- if no take-overs are allowed, then the agent might employ a sub-optimal, possibly dangerous policy. Alternatively, if there are too many take-overs, then the human has no confidence in the agent, greatly limiting its usefulness. In this paper, we formalize this setup and propose an explainability scheme to help optimize the number of human interventions.
Related papers
- The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy [9.553819152637493]
We study a minimal control interface where an agent chooses whether to act autonomously (play) or defer (ask)<n>If the agent defers, the human's choice determines the outcome, potentially leading to a corrective action or a system shutdown.<n>Our analysis focuses on cases where this game qualifies as a Markov Potential Game (MPG), a class of games where we can provide an alignment guarantee.
arXiv Detail & Related papers (2025-10-30T17:46:49Z) - How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations [112.57167042285437]
We study how agents do human work by presenting the first direct comparison of human and agent workers.<n>We find that agents deliver results 88.3% faster and cost 90.4-96.2% less than humans.
arXiv Detail & Related papers (2025-10-26T18:10:22Z) - Ethics2vec: aligning automatic agents and human preferences [0.19580473532948395]
This paper proposes a way to map an automatic agent decision-making (or control law) strategy to a multivariate vector representation.<n>The Ethics2Vec method is first introduced in the case of an automatic agent performing binary decision-making.
arXiv Detail & Related papers (2025-08-11T06:52:46Z) - Kaleidoscopic Teaming in Multi Agent Simulations [75.47388708240042]
We argue that existing red teaming or safety evaluation frameworks fall short in evaluating safety risks in complex behaviors, thought processes and actions taken by agents.<n>We introduce new in-context optimization techniques that can be used in our kaleidoscopic teaming framework to generate better scenarios for safety analysis.<n>We present appropriate metrics that can be used along with our framework to measure safety of agents.
arXiv Detail & Related papers (2025-06-20T23:37:17Z) - Levels of Autonomy for AI Agents [9.324309359500198]
We argue that an agent's level of autonomy can be treated as a deliberate design decision, separate from its capability and operational environment.<n>We define five levels of escalating agent autonomy, characterized by the roles a user can take when interacting with an agent.<n>We highlight a potential application of our framework towards AI autonomy certificates to govern agent behavior in single- and multi-agent systems.
arXiv Detail & Related papers (2025-06-14T12:14:36Z) - When Should We Orchestrate Multiple Agents? [74.27052374196269]
Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration.<n>We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints.<n>We show theoretically that orchestration is only effective if there are performance or cost differentials between agents.
arXiv Detail & Related papers (2025-03-17T14:26:07Z) - AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents [75.85554113398626]
We develop a benchmark called AgentDAM to evaluate how well existing and future AI agents can limit processing of potentially private information.<n>Our benchmark simulates realistic web interaction scenarios and is adaptable to all existing web navigation agents.
arXiv Detail & Related papers (2025-03-12T19:30:31Z) - Uncertainty Comes for Free: Human-in-the-Loop Policies with Diffusion Models [3.076241811701216]
We propose a method that allows diffusion policies to actively seek human assistance only when necessary, reducing reliance on constant human oversight.
We leverage the generative process of diffusion policies to compute an uncertainty-based metric based on which the autonomous agent can decide to request operator assistance at deployment time.
We show that the same method can be used for efficient data collection for fine-tuning diffusion policies in order to improve their autonomous performance.
arXiv Detail & Related papers (2025-02-26T15:12:29Z) - When Trust is Zero Sum: Automation Threat to Epistemic Agency [15.3187914835649]
Even in cases where workers keep their jobs, their agency within them might be severely downgraded.
Job retention focused solutions, such as designing an algorithm to work alongside the human employee, may only enable these harms.
arXiv Detail & Related papers (2024-08-16T17:10:19Z) - Human-compatible driving partners through data-regularized self-play reinforcement learning [3.9682126792844583]
Human-Regularized PPO (HR-PPO) is a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy.
Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%.
arXiv Detail & Related papers (2024-03-28T17:56:56Z) - Introducing Risk Shadowing For Decisive and Comfortable Behavior
Planning [0.0]
We develop risk shadowing, a situation understanding method that allows us to go beyond single interactions.
We show that using risk shadowing as an upstream filter module for a behavior planner allows to plan more decisive and comfortable driving strategies.
arXiv Detail & Related papers (2023-07-20T09:16:01Z) - When to Ask for Help: Proactive Interventions in Autonomous
Reinforcement Learning [57.53138994155612]
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world.
A critical challenge is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table.
We propose an algorithm that efficiently learns to detect and avoid states that are irreversible, and proactively asks for help in case the agent does enter them.
arXiv Detail & Related papers (2022-10-19T17:57:24Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - The Concept of Criticality in AI Safety [8.442084903594528]
When AI agents don't align their actions with human values they may cause serious harm.
One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions.
We propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task.
arXiv Detail & Related papers (2022-01-12T17:44:22Z) - Balancing Performance and Human Autonomy with Implicit Guidance Agent [8.071506311915396]
We show that implicit guidance is effective for enabling humans to maintain a balance between improving their plans and retaining autonomy.
We modeled a collaborative agent with implicit guidance by integrating the Bayesian Theory of Mind into existing collaborative-planning algorithms.
arXiv Detail & Related papers (2021-09-01T14:47:29Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z) - A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous
Algorithmic Scores [85.12096045419686]
We study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions.
We first show that humans do alter their behavior when the tool is deployed.
We show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk.
arXiv Detail & Related papers (2020-02-19T07:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.