PREDILECT: Preferences Delineated with Zero-Shot Language-based
Reasoning in Reinforcement Learning
- URL: http://arxiv.org/abs/2402.15420v1
- Date: Fri, 23 Feb 2024 16:30:05 GMT
- Title: PREDILECT: Preferences Delineated with Zero-Shot Language-based
Reasoning in Reinforcement Learning
- Authors: Simon Holk, Daniel Marta, Iolanda Leite
- Abstract summary: Preference-based reinforcement learning (RL) has emerged as a new field in robot learning.
We use the zero-shot capabilities of a large language model (LLM) to reason from the text provided by humans.
In both a simulated scenario and a user study, we reveal the effectiveness of our work by analyzing the feedback and its implications.
- Score: 2.7387720378113554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preference-based reinforcement learning (RL) has emerged as a new field in
robot learning, where humans play a pivotal role in shaping robot behavior by
expressing preferences on different sequences of state-action pairs. However,
formulating realistic policies for robots demands responses from humans to an
extensive array of queries. In this work, we approach the sample-efficiency
challenge by expanding the information collected per query to contain both
preferences and optional text prompting. To accomplish this, we leverage the
zero-shot capabilities of a large language model (LLM) to reason from the text
provided by humans. To accommodate the additional query information, we
reformulate the reward learning objectives to contain flexible highlights --
state-action pairs that contain relatively high information and are related to
the features processed in a zero-shot fashion from a pretrained LLM. In both a
simulated scenario and a user study, we reveal the effectiveness of our work by
analyzing the feedback and its implications. Additionally, the collective
feedback collected serves to train a robot on socially compliant trajectories
in a simulated social navigation landscape. We provide video examples of the
trained policies at https://sites.google.com/view/rl-predilect
Related papers
- LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
Vision Language Models (VLMs) can process state information as visual-textual prompts and respond with policy decisions in text.
We propose LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as conversations.
arXiv Detail & Related papers (2024-06-28T17:59:12Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - LARG, Language-based Automatic Reward and Goal Generation [8.404316955848602]
We develop an approach that converts a text-based task description into its corresponding reward and goal-generation functions.
We evaluate our approach for robotic manipulation and demonstrate its ability to train and execute policies in a scalable manner.
arXiv Detail & Related papers (2023-06-19T14:52:39Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - Few-Shot Preference Learning for Human-in-the-Loop RL [13.773589150740898]
Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries.
We reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot.
arXiv Detail & Related papers (2022-12-06T23:12:26Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.