Is Feedback All You Need? Leveraging Natural Language Feedback in
Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2312.04736v1
- Date: Thu, 7 Dec 2023 22:33:34 GMT
- Title: Is Feedback All You Need? Leveraging Natural Language Feedback in
Goal-Conditioned Reinforcement Learning
- Authors: Sabrina McCallum, Max Taylor-Davies, Stefano V. Albrecht, Alessandro
Suglia
- Abstract summary: We extend BabyAI to automatically generate language feedback from environment dynamics and goal condition success.
We modify the Decision Transformer architecture to take advantage of this additional signal.
We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents' generalisation performance.
- Score: 54.31495290436766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite numerous successes, the field of reinforcement learning (RL) remains
far from matching the impressive generalisation power of human behaviour
learning. One possible way to help bridge this gap be to provide RL agents with
richer, more human-like feedback expressed in natural language. To investigate
this idea, we first extend BabyAI to automatically generate language feedback
from the environment dynamics and goal condition success. Then, we modify the
Decision Transformer architecture to take advantage of this additional signal.
We find that training with language feedback either in place of or in addition
to the return-to-go or goal descriptions improves agents' generalisation
performance, and that agents can benefit from feedback even when this is only
available during training, but not at inference.
Related papers
- Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use [16.425032085699698]
It is desirable for embodied agents to have the ability to leverage human language to gain explicit or implicit knowledge for learning tasks.
It's not clear how to incorporate rich language use to facilitate task learning.
This paper studies different types of language inputs in facilitating reinforcement learning.
arXiv Detail & Related papers (2024-10-31T17:59:52Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Improving Code Generation by Training with Natural Language Feedback [69.52985513422381]
We formalize an algorithm for learning from natural language feedback at training time instead, which we call learning from Language Feedback (ILF)
ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient.
We use ILF to improve a Codegen-Mono 6.1B model's pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python Problems (MBPP) benchmark.
arXiv Detail & Related papers (2023-03-28T16:15:31Z) - Reflexion: Language Agents with Verbal Reinforcement Learning [44.85337947858337]
Reflexion is a novel framework to reinforce language agents not by updating weights, but through linguistic feedback.
It is flexible enough to incorporate various types (scalar values or free-form language) and sources (external or internally simulated) of feedback signals.
For example, Reflexion achieves a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4 that achieves 80%.
arXiv Detail & Related papers (2023-03-20T18:08:50Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning
for Robotics [14.863872352905629]
This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations.
We first present a mechanism for hindsight instruction replay utilizing expert feedback.
Second, we propose a seq2seq model to generate linguistic hindsight instructions.
arXiv Detail & Related papers (2022-04-08T22:01:36Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Influencing Reinforcement Learning through Natural Language Guidance [4.227540427595989]
We explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent.
Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal.
In our case, we replace this human feedback policy with policy generated based on natural language advice.
arXiv Detail & Related papers (2021-04-04T00:23:39Z) - Learning Rewards from Linguistic Feedback [30.30912759796109]
We explore unconstrained natural language feedback as a learning signal for artificial agents.
We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards.
arXiv Detail & Related papers (2020-09-30T14:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.