PREDILECT: Preferences Delineated with Zero-Shot Language-based
Reasoning in Reinforcement Learning
- URL: http://arxiv.org/abs/2402.15420v1
- Date: Fri, 23 Feb 2024 16:30:05 GMT
- Title: PREDILECT: Preferences Delineated with Zero-Shot Language-based
Reasoning in Reinforcement Learning
- Authors: Simon Holk, Daniel Marta, Iolanda Leite
- Abstract summary: Preference-based reinforcement learning (RL) has emerged as a new field in robot learning.
We use the zero-shot capabilities of a large language model (LLM) to reason from the text provided by humans.
In both a simulated scenario and a user study, we reveal the effectiveness of our work by analyzing the feedback and its implications.
- Score: 2.7387720378113554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preference-based reinforcement learning (RL) has emerged as a new field in
robot learning, where humans play a pivotal role in shaping robot behavior by
expressing preferences on different sequences of state-action pairs. However,
formulating realistic policies for robots demands responses from humans to an
extensive array of queries. In this work, we approach the sample-efficiency
challenge by expanding the information collected per query to contain both
preferences and optional text prompting. To accomplish this, we leverage the
zero-shot capabilities of a large language model (LLM) to reason from the text
provided by humans. To accommodate the additional query information, we
reformulate the reward learning objectives to contain flexible highlights --
state-action pairs that contain relatively high information and are related to
the features processed in a zero-shot fashion from a pretrained LLM. In both a
simulated scenario and a user study, we reveal the effectiveness of our work by
analyzing the feedback and its implications. Additionally, the collective
feedback collected serves to train a robot on socially compliant trajectories
in a simulated social navigation landscape. We provide video examples of the
trained policies at https://sites.google.com/view/rl-predilect
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment [73.14105098897696]
We propose Representation-Aligned Preference-based Learning (RAPL) to learn visual rewards from significantly less human preference feedback.
RAPL focuses on fine-tuning pre-trained vision encoders to align with the end-user's visual representation and then constructs a dense visual reward via feature matching.
We show that RAPL can learn rewards aligned with human preferences, more efficiently uses preference data, and generalizes across robot embodiments.
arXiv Detail & Related papers (2024-12-06T08:04:02Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - LARG, Language-based Automatic Reward and Goal Generation [8.404316955848602]
We develop an approach that converts a text-based task description into its corresponding reward and goal-generation functions.
We evaluate our approach for robotic manipulation and demonstrate its ability to train and execute policies in a scalable manner.
arXiv Detail & Related papers (2023-06-19T14:52:39Z) - SOCRATES: Text-based Human Search and Approach using a Robot Dog [6.168521568443759]
We propose a SOCratic model for Robots Approaching humans based on TExt System (SOCRATES)
We first present a Human Search Socratic Model that connects large pre-trained models in the language domain to solve the downstream task.
Then, we propose a hybrid learning-based framework for generating target-cordial robotic motion to approach a person.
arXiv Detail & Related papers (2023-02-10T15:35:24Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - Few-Shot Preference Learning for Human-in-the-Loop RL [13.773589150740898]
Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries.
We reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot.
arXiv Detail & Related papers (2022-12-06T23:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.