REvolve: Reward Evolution with Large Language Models for Autonomous Driving
- URL: http://arxiv.org/abs/2406.01309v1
- Date: Mon, 3 Jun 2024 13:23:27 GMT
- Title: REvolve: Reward Evolution with Large Language Models for Autonomous Driving
- Authors: Rishi Hazra, Alkis Sygkounas, Andreas Persson, Amy Loutfi, Pedro Zuidberg Dos Martires,
- Abstract summary: Large language models (LLMs) have been used for reward generation from natural language task descriptions.
We introduce REvolve, an evolutionary framework that uses LLMs for reward design in autonomous driving.
We demonstrate that agents trained on REvolve-designed rewards align closely with human driving standards, thereby outperforming other state-of-the-art baselines.
- Score: 6.4550546442058225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing effective reward functions is crucial to training reinforcement learning (RL) algorithms. However, this design is non-trivial, even for domain experts, due to the subjective nature of certain tasks that are hard to quantify explicitly. In recent works, large language models (LLMs) have been used for reward generation from natural language task descriptions, leveraging their extensive instruction tuning and commonsense understanding of human behavior. In this work, we hypothesize that LLMs, guided by human feedback, can be used to formulate human-aligned reward functions. Specifically, we study this in the challenging setting of autonomous driving (AD), wherein notions of "good" driving are tacit and hard to quantify. To this end, we introduce REvolve, an evolutionary framework that uses LLMs for reward design in AD. REvolve creates and refines reward functions by utilizing human feedback to guide the evolution process, effectively translating implicit human knowledge into explicit reward functions for training (deep) RL agents. We demonstrate that agents trained on REvolve-designed rewards align closely with human driving standards, thereby outperforming other state-of-the-art baselines.
Related papers
- Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
We study rewards shaped by vision-language models (VLMs) to define dense rewards for robotic learning.
On a real-world manipulation task specified by natural language description, we find that these rewards improve the sample efficiency of autonomous RL.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - OCALM: Object-Centric Assessment with Language Models [33.10137796492542]
We propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for reinforcement learning agents.
OCALM uses the extensive world-knowledge of language models to derive reward functions focused on relational concepts.
arXiv Detail & Related papers (2024-06-24T15:57:48Z) - Generating and Evolving Reward Functions for Highway Driving with Large Language Models [18.464822261908562]
Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies.
We introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving.
arXiv Detail & Related papers (2024-06-15T07:50:10Z) - In-context Learning for Automated Driving Scenarios [15.325910109153616]
One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively.
This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way.
arXiv Detail & Related papers (2024-05-07T09:04:52Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Accelerating Reinforcement Learning of Robotic Manipulations via
Feedback from Large Language Models [21.052532074815765]
We introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework.
It enables RL agents to learn robotic tasks efficiently by taking advantage of Large Language Models' timely feedback.
It outperforms the baseline in terms of both learning efficiency and success rate.
arXiv Detail & Related papers (2023-11-04T11:21:38Z) - Eureka: Human-Level Reward Design via Coding Large Language Models [121.91007140014982]
Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks.
We present Eureka, a human-level reward design algorithm powered by LLMs.
Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs.
arXiv Detail & Related papers (2023-10-19T17:31:01Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.