Related papers: Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics

Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics

URL: http://arxiv.org/abs/2204.04308v1
Date: Fri, 8 Apr 2022 22:01:36 GMT
Title: Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics
Authors: Frank R\"oder, Manfred Eppe and Stefan Wermter
Abstract summary: This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions.
Score: 14.863872352905629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. An open problem is the sample-inefficiency that stems from the compositionality of natural language, and from the grounding of language in sensory data and actions. We address these issues with three contributions. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions. Finally, we present a novel class of language-focused learning tasks. We show that hindsight instructions improve the learning performance, as expected. In addition, we also provide an unexpected result: We show that the learning performance of our agent can be improved by one third if, in a sense, the agent learns to talk to itself in a self-supervised manner. We achieve this by learning to generate linguistic instructions that would have been appropriate as a natural language goal for an originally unintended behavior. Our results indicate that the performance gain increases with the task-complexity.

Related papers

Punctuation Restoration Improves Structure Understanding without Supervision [6.4736137270915215]
We show that punctuation restoration as a learning objective improves in- and out-of-distribution performance on structure-related tasks. Punctuation restoration is an effective learning objective that can improve structure understanding and yield a more robust structure-aware representations of natural language.
arXiv Detail & Related papers (2024-02-13T11:22:52Z)
Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning [54.31495290436766]
We extend BabyAI to automatically generate language feedback from environment dynamics and goal condition success. We modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents' generalisation performance.
arXiv Detail & Related papers (2023-12-07T22:33:34Z)
Language-Driven Representation Learning for Robotics [115.93273609767145]
Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks. We introduce a framework for language-driven representation learning from human videos and captions. We find that Voltron's language-driven learning outperform the prior-of-the-art, especially on targeted problems requiring higher-level control.
arXiv Detail & Related papers (2023-02-24T17:29:31Z)
How to talk so your robot will learn: Instructions, descriptions, and pragmatics [14.289220844201695]
We study how a human might communicate preferences over behaviors. We show that in traditional reinforcement learning settings, pragmatic social learning can integrate with and accelerate individual learning. Our findings suggest that social learning from a wider range of language is a promising approach for value alignment and reinforcement learning more broadly.
arXiv Detail & Related papers (2022-06-16T01:33:38Z)
Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings. We demonstrate that this framework enables effective generalization across different environments. For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z)
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. We propose to leverage offline robot datasets with crowd-sourced natural language labels. We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z)
Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning [32.82030512053361]
We propose the use of step-by-step human demonstrations in the form of natural language instructions and action trajectories. We find that human demonstrations help solve the most complex tasks. We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting.
arXiv Detail & Related papers (2020-11-01T14:39:46Z)
Learning Rewards from Linguistic Feedback [30.30912759796109]
We explore unconstrained natural language feedback as a learning signal for artificial agents. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards.
arXiv Detail & Related papers (2020-09-30T14:51:00Z)
Inverse Reinforcement Learning with Natural Language Goals [8.972202854038382]
We propose a novel inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function. Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset.
arXiv Detail & Related papers (2020-08-16T14:43:49Z)
Semantics-Aware Inferential Network for Natural Language Understanding [79.70497178043368]
We propose a Semantics-Aware Inferential Network (SAIN) to meet such a motivation. Taking explicit contextualized semantics as a complementary input, the inferential module of SAIN enables a series of reasoning steps over semantic clues. Our model achieves significant improvement on 11 tasks including machine reading comprehension and natural language inference.
arXiv Detail & Related papers (2020-04-28T07:24:43Z)
On the interaction between supervision and self-play in emergent communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency. We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.