Pavlovian Signalling with General Value Functions in Agent-Agent
Temporal Decision Making
- URL: http://arxiv.org/abs/2201.03709v1
- Date: Tue, 11 Jan 2022 00:14:04 GMT
- Title: Pavlovian Signalling with General Value Functions in Agent-Agent
Temporal Decision Making
- Authors: Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A.
Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil,
Patrick M. Pilarski
- Abstract summary: We study Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent.
As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning between two agents.
- Score: 6.704848594973921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we contribute a multi-faceted study into Pavlovian signalling
-- a process by which learned, temporally extended predictions made by one
agent inform decision-making by another agent. Signalling is intimately
connected to time and timing. In service of generating and receiving signals,
humans and other animals are known to represent time, determine time since past
events, predict the time until a future stimulus, and both recognize and
generate patterns that unfold in time. We investigate how different temporal
processes impact coordination and signalling between learning agents by
introducing a partially observable decision-making domain we call the Frost
Hollow. In this domain, a prediction learning agent and a reinforcement
learning agent are coupled into a two-part decision-making system that works to
acquire sparse reward while avoiding time-conditional hazards. We evaluate two
domain variations: machine agents interacting in a seven-state linear walk, and
human-machine interaction in a virtual-reality environment. Our results
showcase the speed of learning for Pavlovian signalling, the impact that
different temporal representations do (and do not) have on agent-agent
coordination, and how temporal aliasing impacts agent-agent and human-agent
interactions differently. As a main contribution, we establish Pavlovian
signalling as a natural bridge between fixed signalling paradigms and fully
adaptive communication learning between two agents. We further show how to
computationally build this adaptive signalling process out of a fixed
signalling process, characterized by fast continual prediction learning and
minimal constraints on the nature of the agent receiving signals. Our results
therefore suggest an actionable, constructivist path towards communication
learning between reinforcement learning agents.
Related papers
- Neural Interaction Energy for Multi-Agent Trajectory Prediction [55.098754835213995]
We introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE)
MATE assesses the interactive motion of agents by employing neural interaction energy.
To bolster temporal stability, we introduce two constraints: inter-agent interaction constraint and intra-agent motion constraint.
arXiv Detail & Related papers (2024-04-25T12:47:47Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - SMEMO: Social Memory for Trajectory Forecasting [34.542209630734234]
We present a neural network based on an end-to-end trainable working memory, which acts as an external storage.
We show that our method is capable of learning explainable cause-effect relationships between motions of different agents, obtaining state-of-the-art results on trajectory forecasting datasets.
arXiv Detail & Related papers (2022-03-23T14:40:20Z) - The Frost Hollow Experiments: Pavlovian Signalling as a Path to
Coordination and Communication Between Agents [7.980685978549764]
This paper contributes a multi-faceted study into what we term Pavlovian signalling.
We establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning.
Our results point to an actionable, constructivist path towards continual communication learning between reinforcement learning agents.
arXiv Detail & Related papers (2022-03-17T17:49:45Z) - Assessing Human Interaction in Virtual Reality With Continually Learning
Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study [6.076137037890219]
We investigate how the interaction between a human and a continually learning prediction agent develops as the agent develops competency.
We develop a virtual reality environment and a time-based prediction task wherein learned predictions from a reinforcement learning (RL) algorithm augment human predictions.
Our findings suggest that human trust of the system may be influenced by early interactions with the agent, and that trust in turn affects strategic behaviour.
arXiv Detail & Related papers (2021-12-14T22:46:44Z) - Learning Proxemic Behavior Using Reinforcement Learning with Cognitive
Agents [1.0635883951034306]
Proxemics is a branch of non-verbal communication concerned with studying the spatial behavior of people and animals.
We study how agents behave in environments based on proxemic behavior.
arXiv Detail & Related papers (2021-08-08T20:45:34Z) - Unlimited Neighborhood Interaction for Heterogeneous Trajectory
Prediction [97.40338982628094]
We propose a simple yet effective Unlimited Neighborhood Interaction Network (UNIN) which predicts trajectories of heterogeneous agents in multiply categories.
Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously.
A hierarchical graph attention module is proposed to obtain category-tocategory interaction and agent-to-agent interaction.
arXiv Detail & Related papers (2021-07-31T13:36:04Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Investigating Human Response, Behaviour, and Preference in Joint-Task
Interaction [3.774610219328564]
We have designed an experiment in order to examine human behaviour and response as they interact with Explainable Planning (XAIP) agents.
We also present the results from an empirical analysis where we examined the behaviour of the two agents for simulated users.
arXiv Detail & Related papers (2020-11-27T22:16:59Z) - Learning to Communicate and Correct Pose Errors [75.03747122616605]
We study the setting proposed in V2VNet, where nearby self-driving vehicles jointly perform object detection and motion forecasting in a cooperative manner.
We propose a novel neural reasoning framework that learns to communicate, to estimate potential errors, and to reach a consensus about those errors.
arXiv Detail & Related papers (2020-11-10T18:19:40Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.