Self-Refined Large Language Model as Automated Reward Function Designer
for Deep Reinforcement Learning in Robotics
- URL: http://arxiv.org/abs/2309.06687v2
- Date: Mon, 2 Oct 2023 17:20:21 GMT
- Title: Self-Refined Large Language Model as Automated Reward Function Designer
for Deep Reinforcement Learning in Robotics
- Authors: Jiayang Song, Zhehua Zhou, Jiawei Liu, Chunrong Fang, Zhan Shu, Lei Ma
- Abstract summary: Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge.
We propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design.
- Score: 14.773498542408264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although Deep Reinforcement Learning (DRL) has achieved notable success in
numerous robotic applications, designing a high-performing reward function
remains a challenging task that often requires substantial manual input.
Recently, Large Language Models (LLMs) have been extensively adopted to address
tasks demanding in-depth common-sense knowledge, such as reasoning and
planning. Recognizing that reward function design is also inherently linked to
such knowledge, LLM offers a promising potential in this context. Motivated by
this, we propose in this work a novel LLM framework with a self-refinement
mechanism for automated reward function design. The framework commences with
the LLM formulating an initial reward function based on natural language
inputs. Then, the performance of the reward function is assessed, and the
results are presented back to the LLM for guiding its self-refinement process.
We examine the performance of our proposed framework through a variety of
continuous robotic control tasks across three diverse robotic systems. The
results indicate that our LLM-designed reward functions are able to rival or
even surpass manually designed reward functions, highlighting the efficacy and
applicability of our approach.
Related papers
- Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.
On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.
WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - CLIP-Motion: Learning Reward Functions for Robotic Actions Using
Consecutive Observations [1.03590082373586]
This paper presents a novel method for learning reward functions for robotic motions by harnessing the power of a CLIP-based model.
Our approach circumvents this challenge by capitalizing on CLIP's capability to process both state features and image inputs effectively.
arXiv Detail & Related papers (2023-11-06T19:48:03Z) - Accelerating Reinforcement Learning of Robotic Manipulations via
Feedback from Large Language Models [21.052532074815765]
We introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework.
It enables RL agents to learn robotic tasks efficiently by taking advantage of Large Language Models' timely feedback.
It outperforms the baseline in terms of both learning efficiency and success rate.
arXiv Detail & Related papers (2023-11-04T11:21:38Z) - Learning Reward for Physical Skills using Large Language Model [5.795405764196473]
Large Language Models contain valuable task-related knowledge that can aid in learning reward functions.
We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills.
arXiv Detail & Related papers (2023-10-21T19:10:06Z) - Language to Rewards for Robotic Skill Synthesis [37.21434094015743]
We introduce a new paradigm that harnesses large language models (LLMs) to define reward parameters that can be optimized and accomplish variety of robotic tasks.
Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.
arXiv Detail & Related papers (2023-06-14T17:27:10Z) - Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement
Learning [55.2080971216584]
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL)
We develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches.
arXiv Detail & Related papers (2023-01-26T01:06:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.