Learning Communication Policies for Different Follower Behaviors in a
Collaborative Reference Game
- URL: http://arxiv.org/abs/2402.04824v1
- Date: Wed, 7 Feb 2024 13:22:17 GMT
- Title: Learning Communication Policies for Different Follower Behaviors in a
Collaborative Reference Game
- Authors: Philipp Sadler, Sherzod Hakimov and David Schlangen
- Abstract summary: We evaluate the adaptability of neural artificial agents towards assumed partner behaviors in a collaborative reference game.
Our results indicate that this novel ingredient leads to communicative strategies that are less verbose.
- Score: 22.28337771947361
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Albrecht and Stone (2018) state that modeling of changing behaviors remains
an open problem "due to the essentially unconstrained nature of what other
agents may do". In this work we evaluate the adaptability of neural artificial
agents towards assumed partner behaviors in a collaborative reference game. In
this game success is achieved when a knowledgeable Guide can verbally lead a
Follower to the selection of a specific puzzle piece among several distractors.
We frame this language grounding and coordination task as a reinforcement
learning problem and measure to which extent a common reinforcement training
algorithm (PPO) is able to produce neural agents (the Guides) that perform well
with various heuristic Follower behaviors that vary along the dimensions of
confidence and autonomy. We experiment with a learning signal that in addition
to the goal condition also respects an assumed communicative effort. Our
results indicate that this novel ingredient leads to communicative strategies
that are less verbose (staying silent in some of the steps) and that with
respect to that the Guide's strategies indeed adapt to the partner's level of
confidence and autonomy.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Learning to Coordinate without Communication under Incomplete Information [39.106914895158035]
We show how an autonomous agent can learn to cooperate by interpreting its partner's actions.
Experimental results in a testbed called Gnomes at Night show that the learned no-communication coordination strategy achieves significantly higher success rates.
arXiv Detail & Related papers (2024-09-19T01:41:41Z) - AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions.
Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning.
An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z) - GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment [72.96949760114575]
We propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA)
GOMA formulates verbal communication as a planning problem that minimizes the misalignment between parts of agents' mental states that are relevant to the goals.
We evaluate our approach against strong baselines in two challenging environments, Overcooked (a multiplayer game) and VirtualHome (a household simulator)
arXiv Detail & Related papers (2024-03-17T03:52:52Z) - Learning Intuitive Policies Using Action Features [7.260481131198059]
We investigate the effect of network architecture on the propensity of learning algorithms to exploit semantic relationships.
We find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for learning intuitive policies.
arXiv Detail & Related papers (2022-01-29T20:54:52Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria [57.74495091445414]
Social deduction games offer an avenue to study how individuals might learn to synthesize potentially unreliable information about others.
In this work, we present Hidden Agenda, a two-team social deduction game that provides a 2D environment for studying learning agents in scenarios of unknown team alignment.
Reinforcement learning agents trained in Hidden Agenda show that agents can learn a variety of behaviors, including partnering and voting without need for communication in natural language.
arXiv Detail & Related papers (2022-01-05T20:54:10Z) - Curriculum-Driven Multi-Agent Learning and the Role of Implicit
Communication in Teamwork [24.92668968807012]
We propose a curriculum-driven learning strategy for solving difficult multi-agent coordination tasks.
We argue that emergent implicit communication plays a large role in enabling superior levels of coordination.
arXiv Detail & Related papers (2021-06-21T14:54:07Z) - Connecting Context-specific Adaptation in Humans to Meta-learning [23.923548278086383]
We show how context-conditioned meta-learning can capture human behavior in a cognitive task.
Our work demonstrates that guiding meta-learning with task information can capture complex, human-like behavior.
arXiv Detail & Related papers (2020-11-27T15:31:39Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.