On the interaction between supervision and self-play in emergent
communication
- URL: http://arxiv.org/abs/2002.01093v2
- Date: Mon, 22 Jun 2020 20:48:08 GMT
- Title: On the interaction between supervision and self-play in emergent
communication
- Authors: Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau
- Abstract summary: We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
- Score: 82.290338507106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A promising approach for teaching artificial agents to use natural language
involves using human-in-the-loop training. However, recent work suggests that
current machine learning methods are too data inefficient to be trained in this
way from scratch. In this paper, we investigate the relationship between two
categories of learning signals with the ultimate goal of improving sample
efficiency: imitating human language data via supervised learning, and
maximizing reward in a simulated multi-agent environment via self-play (as done
in emergent communication), and introduce the term supervised self-play (S2P)
for algorithms using both of these signals. We find that first training agents
via supervised learning on human data followed by self-play outperforms the
converse, suggesting that it is not beneficial to emerge languages from
scratch. We then empirically investigate various S2P schedules that begin with
supervised learning in two environments: a Lewis signaling game with symbolic
inputs, and an image-based referential game with natural language descriptions.
Lastly, we introduce population based approaches to S2P, which further improves
the performance over single-agent methods.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Policy Learning with a Language Bottleneck [65.99843627646018]
Policy Learning with a Language Bottleneck (PLLBB) is a framework enabling AI agents to generate linguistic rules.
PLLBB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules.
In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show thatPLLBB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Contrastive Language, Action, and State Pre-training for Robot Learning [1.1000499414131326]
We introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning.
Our method, Contrastive Language, Action, and State Pre-training (CLASP), extends the CLIP formulation by incorporating distributional learning, capturing the inherent complexities and one-to-many relationships in behaviour-text alignment.
We demonstrate the utility of our method for the following downstream tasks: zero-shot text-behaviour retrieval, captioning unseen robot behaviours, and learning a behaviour prior to language-conditioned reinforcement learning.
arXiv Detail & Related papers (2023-04-21T07:19:33Z) - Intra-agent speech permits zero-shot task acquisition [13.19051572784014]
We take inspiration from processes of "inner speech" in humans to better understand the role of intra-agent speech in embodied behavior.
We develop algorithms that enable visually grounded captioning with little labeled language data.
We incorporate intra-agent speech into an embodied, mobile manipulator agent operating in a 3D virtual world.
arXiv Detail & Related papers (2022-06-07T09:28:10Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Language-Conditioned Imitation Learning for Robot Manipulation Tasks [39.40937105264774]
We introduce a method for incorporating unstructured natural language into imitation learning.
At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent.
The training process then interrelates these two modalities to encode the correlations between language, perception, and motion.
The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
arXiv Detail & Related papers (2020-10-22T21:49:08Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Human Instruction-Following with Deep Reinforcement Learning via
Transfer-Learning from Text [12.88819706338837]
Recent work has described neural-network-based agents that are trained with reinforcement learning to execute language-like commands in simulated worlds.
We propose a conceptually simple method for training instruction-following agents with deep RL that are robust to natural human instructions.
arXiv Detail & Related papers (2020-05-19T12:16:58Z) - Multi-agent Communication meets Natural Language: Synergies between
Functional and Structural Language Learning [16.776753238108036]
We present a method for combining multi-agent communication and traditional data-driven approaches to natural language learning.
Our starting point is a language model that has been trained on generic, not task-specific language data.
We then place this model in a multi-agent self-play environment that generates task-specific rewards used to adapt or modulate the model.
arXiv Detail & Related papers (2020-05-14T15:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.