Generating and Evolving Reward Functions for Highway Driving with Large Language Models
- URL: http://arxiv.org/abs/2406.10540v1
- Date: Sat, 15 Jun 2024 07:50:10 GMT
- Title: Generating and Evolving Reward Functions for Highway Driving with Large Language Models
- Authors: Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu,
- Abstract summary: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies.
We introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving.
- Score: 18.464822261908562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving. This framework utilizes the coding capabilities of LLMs, proven in other areas, to generate and evolve reward functions for highway scenarios. The framework starts with instructing LLMs to create an initial reward function code based on the driving environment and task descriptions. This code is then refined through iterative cycles involving RL training and LLMs' reflection, which benefits from their ability to review and improve the output. We have also developed a specific prompt template to improve LLMs' understanding of complex driving simulations, ensuring the generation of effective and error-free code. Our experiments in a highway driving simulator across three traffic configurations show that our method surpasses expert handcrafted reward functions, achieving a 22% higher average success rate. This not only indicates safer driving but also suggests significant gains in development productivity.
Related papers
- OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework [3.8320050452121692]
We introduce OWLed, the Outlier-Weighed Layerwise Pruning for Efficient Autonomous Driving Framework.
Our method assigns non-uniform sparsity ratios to different layers based on the distribution of outlier features.
To ensure the compressed model adapts well to autonomous driving tasks, we incorporate driving environment data into both the calibration and pruning processes.
arXiv Detail & Related papers (2024-11-12T10:55:30Z) - In-context Learning for Automated Driving Scenarios [15.325910109153616]
One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively.
This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way.
arXiv Detail & Related papers (2024-05-07T09:04:52Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Self-Refined Large Language Model as Automated Reward Function Designer
for Deep Reinforcement Learning in Robotics [14.773498542408264]
Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge.
We propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design.
arXiv Detail & Related papers (2023-09-13T02:56:56Z) - Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - Comprehensive Training and Evaluation on Deep Reinforcement Learning for
Automated Driving in Various Simulated Driving Maneuvers [0.4241054493737716]
This study implements, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO)
Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance.
arXiv Detail & Related papers (2023-06-20T11:41:01Z) - Hierarchical Program-Triggered Reinforcement Learning Agents For
Automated Driving [5.404179497338455]
Recent advances in Reinforcement Learning (RL) combined with Deep Learning (DL) have demonstrated impressive performance in complex tasks, including autonomous driving.
We propose HPRL - Hierarchical Program-triggered Reinforcement Learning, which uses a hierarchy consisting of a structured program along with multiple RL agents, each trained to perform a relatively simple task.
The focus of verification shifts to the master program under simple guarantees from the RL agents, leading to a significantly more interpretable and verifiable implementation as compared to a complex RL agent.
arXiv Detail & Related papers (2021-03-25T14:19:54Z) - Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement
Learning [52.2663102239029]
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle on idle-hailing platforms.
Our approach learns ride-based state-value function using a batch training algorithm with deep value.
We benchmark our algorithm with baselines in a ride-hailing simulation environment to demonstrate its superiority in improving income efficiency.
arXiv Detail & Related papers (2021-03-08T05:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.