Explaining Agent Behavior with Large Language Models
- URL: http://arxiv.org/abs/2309.10346v1
- Date: Tue, 19 Sep 2023 06:13:24 GMT
- Title: Explaining Agent Behavior with Large Language Models
- Authors: Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, and Joseph
Campbell
- Abstract summary: We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions.
We show how a compact representation of the agent's behavior can be learned and used to produce plausible explanations.
- Score: 7.128139268426959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent agents such as robots are increasingly deployed in real-world,
safety-critical settings. It is vital that these agents are able to explain the
reasoning behind their decisions to human counterparts, however, their behavior
is often produced by uninterpretable models such as deep neural networks. We
propose an approach to generate natural language explanations for an agent's
behavior based only on observations of states and actions, agnostic to the
underlying model representation. We show how a compact representation of the
agent's behavior can be learned and used to produce plausible explanations with
minimal hallucination while affording user interaction with a pre-trained large
language model. Through user studies and empirical experiments, we show that
our approach generates explanations as helpful as those generated by a human
domain expert while enabling beneficial interactions such as clarification and
counterfactual queries.
Related papers
- Implementation and Application of an Intelligibility Protocol for Interaction with an LLM [0.9187505256430948]
Our interest is in constructing interactive systems involving a human-expert interacting with a machine learning engine.
This is of relevance when addressing complex problems arising in areas of science, the environment, medicine and so on.
We present an algorithmic description of general-purpose implementation, and conduct preliminary experiments on its use in two different areas.
arXiv Detail & Related papers (2024-10-27T21:20:18Z) - Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning [53.45295657891099]
This paper proposes Visual-O1, a multi-modal multi-turn chain-of-thought reasoning framework.
It simulates human multi-modal multi-turn reasoning, providing instantial experience for highly intelligent models.
Our work highlights the potential of artificial intelligence to work like humans in real-world scenarios with uncertainty and ambiguity.
arXiv Detail & Related papers (2024-10-04T11:18:41Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Sim-to-Real Causal Transfer: A Metric Learning Approach to
Causally-Aware Interaction Representations [62.48505112245388]
We take an in-depth look at the causal awareness of modern representations of agent interactions.
We show that recent representations are already partially resilient to perturbations of non-causal agents.
We propose a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z) - Understanding Your Agent: Leveraging Large Language Models for Behavior
Explanation [7.647395374489533]
We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions.
We show that our approach generates explanations as helpful as those produced by a human domain expert.
arXiv Detail & Related papers (2023-11-29T20:16:23Z) - Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI)
To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z) - Inherently Explainable Reinforcement Learning in Natural Language [14.117921448623342]
We focus on the task of creating a reinforcement learning agent that is inherently explainable.
This Hierarchically Explainable Reinforcement Learning agent operates in Interactive Fictions, text-based game environments.
Our agent is designed to treat explainability as a first-class citizen.
arXiv Detail & Related papers (2021-12-16T14:24:35Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Deep Interpretable Models of Theory of Mind For Human-Agent Teaming [0.7734726150561086]
We develop an interpretable modular neural framework for modeling the intentions of other observed entities.
We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft.
arXiv Detail & Related papers (2021-04-07T06:18:58Z) - Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
To build agents that can robustly interact with humans, we would ideally train them while they interact with humans.
We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z) - A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented.
This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.