Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
- URL: http://arxiv.org/abs/2405.07162v3
- Date: Thu, 16 May 2024 02:37:29 GMT
- Title: Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
- Authors: Yuwei Zeng, Yao Mu, Lin Shao,
- Abstract summary: Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions.
We propose a method to learn rewards more efficiently in the absence of humans.
- Score: 11.639973274337274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.
Related papers
- Few-shot In-Context Preference Learning Using Large Language Models [15.84585737510038]
Designing reward functions is a core component of reinforcement learning.
It can be exceedingly inefficient to learn rewards as they are often learned tabula rasa.
We propose In-Context Preference Learning (ICPL) to accelerate learning reward functions from preferences.
arXiv Detail & Related papers (2024-10-22T17:53:34Z) - Sample Efficient Reinforcement Learning by Automatically Learning to
Compose Subtasks [3.1594865504808944]
We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks.
We evaluate our algorithm in a variety of sparse-reward environments.
arXiv Detail & Related papers (2024-01-25T15:06:40Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Learning Reward for Physical Skills using Large Language Model [5.795405764196473]
Large Language Models contain valuable task-related knowledge that can aid in learning reward functions.
We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills.
arXiv Detail & Related papers (2023-10-21T19:10:06Z) - Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning.
IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Active Preference-Based Gaussian Process Regression for Reward Learning [42.697198807877925]
One common approach is to learn reward functions from collected expert demonstrations.
We present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories.
Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework.
arXiv Detail & Related papers (2020-05-06T03:29:27Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z) - Balance Between Efficient and Effective Learning: Dense2Sparse Reward
Shaping for Robot Manipulation with Environment Uncertainty [14.178202899299267]
We propose a simple but powerful reward shaping method, namely Dense2Sparse.
It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness.
The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.
arXiv Detail & Related papers (2020-03-05T16:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.