PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping
Pixels to Rewards
- URL: http://arxiv.org/abs/2007.15543v2
- Date: Thu, 19 Nov 2020 13:42:41 GMT
- Title: PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping
Pixels to Rewards
- Authors: Prasoon Goyal, Scott Niekum, Raymond J. Mooney
- Abstract summary: We propose a model that maps pixels to rewards, given a free-form natural language description of the task.
Experiments on the Meta-World robot manipulation domain show that language-based rewards significantly improves the sample efficiency of policy learning.
- Score: 40.1007184209417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL), particularly in sparse reward settings, often
requires prohibitively large numbers of interactions with the environment,
thereby limiting its applicability to complex problems. To address this,
several prior approaches have used natural language to guide the agent's
exploration. However, these approaches typically operate on structured
representations of the environment, and/or assume some structure in the natural
language commands. In this work, we propose a model that directly maps pixels
to rewards, given a free-form natural language description of the task, which
can then be used for policy learning. Our experiments on the Meta-World robot
manipulation domain show that language-based rewards significantly improves the
sample efficiency of policy learning, both in sparse and dense reward settings.
Related papers
- Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks.
At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids.
We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z) - Large Language Models as Generalizable Policies for Embodied Tasks [50.870491905776305]
We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks.
Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment.
arXiv Detail & Related papers (2023-10-26T18:32:05Z) - Improving Policy Learning via Language Dynamics Distillation [87.27583619910338]
We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions.
We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
arXiv Detail & Related papers (2022-09-30T19:56:04Z) - Semantic Exploration from Language Abstractions and Pretrained
Representations [23.02024937564099]
Effective exploration is a challenge in reinforcement learning (RL)
We define novelty using semantically meaningful state abstractions.
We evaluate vision-language representations, pretrained on natural image captioning datasets.
arXiv Detail & Related papers (2022-04-08T17:08:00Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Learning Invariable Semantical Representation from Language for
Extensible Policy Generalization [4.457682773596843]
We propose a method to learn semantically invariant representations called element randomization.
We theoretically prove the feasibility of learning semantically invariant representations through randomization.
Experiments on challenging long-horizon tasks show that our low-level policy reliably generalizes to tasks against environment changes.
arXiv Detail & Related papers (2022-01-26T08:04:27Z) - Neural Abstructions: Abstractions that Support Construction for Grounded
Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding.
We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model.
We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Ask Your Humans: Using Human Instructions to Improve Generalization in
Reinforcement Learning [32.82030512053361]
We propose the use of step-by-step human demonstrations in the form of natural language instructions and action trajectories.
We find that human demonstrations help solve the most complex tasks.
We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2020-11-01T14:39:46Z) - Inverse Reinforcement Learning with Natural Language Goals [8.972202854038382]
We propose a novel inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function.
Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset.
arXiv Detail & Related papers (2020-08-16T14:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.