Explaining Agent Behavior with Large Language Models
- URL: http://arxiv.org/abs/2309.10346v1
- Date: Tue, 19 Sep 2023 06:13:24 GMT
- Title: Explaining Agent Behavior with Large Language Models
- Authors: Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, and Joseph
Campbell
- Abstract summary: We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions.
We show how a compact representation of the agent's behavior can be learned and used to produce plausible explanations.
- Score: 7.128139268426959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent agents such as robots are increasingly deployed in real-world,
safety-critical settings. It is vital that these agents are able to explain the
reasoning behind their decisions to human counterparts, however, their behavior
is often produced by uninterpretable models such as deep neural networks. We
propose an approach to generate natural language explanations for an agent's
behavior based only on observations of states and actions, agnostic to the
underlying model representation. We show how a compact representation of the
agent's behavior can be learned and used to produce plausible explanations with
minimal hallucination while affording user interaction with a pre-trained large
language model. Through user studies and empirical experiments, we show that
our approach generates explanations as helpful as those generated by a human
domain expert while enabling beneficial interactions such as clarification and
counterfactual queries.
Related papers
- Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Sim-to-Real Causal Transfer: A Metric Learning Approach to
Causally-Aware Interaction Representations [62.48505112245388]
We take an in-depth look at the causal awareness of modern representations of agent interactions.
We show that recent representations are already partially resilient to perturbations of non-causal agents.
We propose a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z) - Understanding Your Agent: Leveraging Large Language Models for Behavior
Explanation [7.647395374489533]
We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions.
We show that our approach generates explanations as helpful as those produced by a human domain expert.
arXiv Detail & Related papers (2023-11-29T20:16:23Z) - Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI)
To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z) - Inherently Explainable Reinforcement Learning in Natural Language [14.117921448623342]
We focus on the task of creating a reinforcement learning agent that is inherently explainable.
This Hierarchically Explainable Reinforcement Learning agent operates in Interactive Fictions, text-based game environments.
Our agent is designed to treat explainability as a first-class citizen.
arXiv Detail & Related papers (2021-12-16T14:24:35Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Deep Interpretable Models of Theory of Mind For Human-Agent Teaming [0.7734726150561086]
We develop an interpretable modular neural framework for modeling the intentions of other observed entities.
We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft.
arXiv Detail & Related papers (2021-04-07T06:18:58Z) - Imitating Interactive Intelligence [24.95842455898523]
We study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment.
To build agents that can robustly interact with humans, we would ideally train them while they interact with humans.
We use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour.
arXiv Detail & Related papers (2020-12-10T13:55:47Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z) - Probing Emergent Semantics in Predictive Agents via Question Answering [29.123837711842995]
Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments.
We propose question-answering as a general paradigm to decode and understand the representations that such agents develop the model.
We probe their internal state representations with synthetic (English) questions, without backpropagating gradients from the question-answering decoder into the agent.
arXiv Detail & Related papers (2020-06-01T15:27:36Z) - A general framework for scientifically inspired explanations in AI [76.48625630211943]
We instantiate the concept of structure of scientific explanation as the theoretical underpinning for a general framework in which explanations for AI systems can be implemented.
This framework aims to provide the tools to build a "mental-model" of any AI system so that the interaction with the user can provide information on demand and be closer to the nature of human-made explanations.
arXiv Detail & Related papers (2020-03-02T10:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.