Improving Policy Learning via Language Dynamics Distillation
- URL: http://arxiv.org/abs/2210.00066v1
- Date: Fri, 30 Sep 2022 19:56:04 GMT
- Title: Improving Policy Learning via Language Dynamics Distillation
- Authors: Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, Tim
Rockt\"aschel
- Abstract summary: We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions.
We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
- Score: 87.27583619910338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has shown that augmenting environments with language descriptions
improves policy learning. However, for environments with complex language
abstractions, learning how to ground language to observations is difficult due
to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD),
which pretrains a model to predict environment dynamics given demonstrations
with language descriptions, and then fine-tunes these language-aware pretrained
representations via reinforcement learning (RL). In this way, the model is
trained to both maximize expected reward and retain knowledge about how
language relates to environment dynamics. On SILG, a benchmark of five tasks
with language descriptions that evaluate distinct generalization challenges on
unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD
outperforms tabula-rasa RL, VAE pretraining, and methods that learn from
unlabeled demonstrations in inverse RL and reward shaping with pretrained
experts. In our analyses, we show that language descriptions in demonstrations
improve sample-efficiency and generalization across environments, and that
dynamics modelling with expert demonstrations is more effective than with
non-experts.
Related papers
- LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling [47.7950860342515]
LexiContrastive Grounding (LCG) is a grounded language learning procedure that leverages visual supervision to improve textual representations.
LCG outperforms standard language-only models in learning efficiency.
It improves upon vision-and-language learning procedures including CLIP, GIT, Flamingo, and Vokenization.
arXiv Detail & Related papers (2024-03-21T16:52:01Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Unsupervised Improvement of Factual Knowledge in Language Models [4.5788796239850225]
Masked language modeling plays a key role in pretraining large language models.
We propose an approach for influencing pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-04T07:37:06Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z) - Curriculum learning for language modeling [2.2475845406292714]
Language models have proven transformational for the natural language processing community.
These models have proven expensive, energy-intensive, and challenging to train.
Curriculum learning is a method that employs a structured training regime instead.
arXiv Detail & Related papers (2021-08-04T16:53:43Z) - Language Models are Few-Shot Butlers [0.2538209532048867]
We introduce a two-stage procedure to learn from a small set of demonstrations and further improve by interacting with an environment.
We show that language models fine-tuned with only 1.2% of the expert demonstrations and a simple reinforcement learning algorithm achieve a 51% absolute improvement in success rate over existing methods in the ALFWorld environment.
arXiv Detail & Related papers (2021-04-16T08:47:07Z) - Learning Spoken Language Representations with Neural Lattice Language
Modeling [39.50831917042577]
We propose a framework that trains neural lattice language models to provide contextualized representations for spoken language understanding tasks.
The proposed two-stage pre-training approach reduces the demands of speech data and has better efficiency.
arXiv Detail & Related papers (2020-07-06T10:38:03Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.