Related papers: Learning Reward for Physical Skills using Large Language Model

Learning Reward for Physical Skills using Large Language Model

URL: http://arxiv.org/abs/2310.14092v1
Date: Sat, 21 Oct 2023 19:10:06 GMT
Title: Learning Reward for Physical Skills using Large Language Model
Authors: Yuwei Zeng, Yiqing Xu
Abstract summary: Large Language Models contain valuable task-related knowledge that can aid in learning reward functions. We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills.
Score: 5.795405764196473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning reward functions for physical skills are challenging due to the vast spectrum of skills, the high-dimensionality of state and action space, and nuanced sensory feedback. The complexity of these tasks makes acquiring expert demonstration data both costly and time-consuming. Large Language Models (LLMs) contain valuable task-related knowledge that can aid in learning these reward functions. However, the direct application of LLMs for proposing reward functions has its limitations such as numerical instability and inability to incorporate the environment feedback. We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills. Our approach consists of two components. We first use the LLM to propose features and parameterization of the reward function. Next, we update the parameters of this proposed reward function through an iterative self-alignment process. In particular, this process minimizes the ranking inconsistency between the LLM and our learned reward functions based on the new observations. We validated our method by testing it on three simulated physical skill learning tasks, demonstrating effective support for our design choices.

Related papers

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics [1.4579344926652846]
We propose a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks.
arXiv Detail & Related papers (2024-11-27T23:58:32Z)
Automated Rewards via LLM-Generated Progress Functions [47.50772243693897]
Large Language Models (LLMs) have the potential to automate reward engineering by leveraging their broad domain knowledge across various tasks. This paper introduces an LLM-driven reward generation framework that is able to produce state-of-the-art policies on the challenging Bi-DexHands benchmark.
arXiv Detail & Related papers (2024-10-11T18:41:15Z)
Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z)
OCALM: Object-Centric Assessment with Language Models [33.10137796492542]
We propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for reinforcement learning agents. OCALM uses the extensive world-knowledge of language models to derive reward functions focused on relational concepts.
arXiv Detail & Related papers (2024-06-24T15:57:48Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Learning Reward for Robot Skills Using Large Language Models via Self-Alignment [11.639973274337274]
Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. We propose a method to learn rewards more efficiently in the absence of humans.
arXiv Detail & Related papers (2024-05-12T04:57:43Z)
Enhancing Q-Learning with Large Language Model Heuristics [0.0]
Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations. We propose textbfLLM-guided Q-learning, a framework that leverages LLMs as hallucinations to aid in learning the Q-function for reinforcement learning.
arXiv Detail & Related papers (2024-05-06T10:42:28Z)
Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics [14.773498542408264]
Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge. We propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design.
arXiv Detail & Related papers (2023-09-13T02:56:56Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
Learning Options via Compression [62.55893046218824]
We propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. Our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood.
arXiv Detail & Related papers (2022-12-08T22:34:59Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Learning Value Functions from Undirected State-only Experience [17.76847333440422]
We show that Markov Qlearning in discrete decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions.
arXiv Detail & Related papers (2022-04-26T17:24:36Z)
Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks. We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible. We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.