Generating and Evolving Reward Functions for Highway Driving with Large Language Models
- URL: http://arxiv.org/abs/2406.10540v1
- Date: Sat, 15 Jun 2024 07:50:10 GMT
- Title: Generating and Evolving Reward Functions for Highway Driving with Large Language Models
- Authors: Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu,
- Abstract summary: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies.
We introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving.
- Score: 18.464822261908562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving. This framework utilizes the coding capabilities of LLMs, proven in other areas, to generate and evolve reward functions for highway scenarios. The framework starts with instructing LLMs to create an initial reward function code based on the driving environment and task descriptions. This code is then refined through iterative cycles involving RL training and LLMs' reflection, which benefits from their ability to review and improve the output. We have also developed a specific prompt template to improve LLMs' understanding of complex driving simulations, ensuring the generation of effective and error-free code. Our experiments in a highway driving simulator across three traffic configurations show that our method surpasses expert handcrafted reward functions, achieving a 22% higher average success rate. This not only indicates safer driving but also suggests significant gains in development productivity.
Related papers
- TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy.
A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z) - LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models [10.425038112892922]
We introduce LearningFlow, an innovative automated policy learning workflow tailored to urban driving.
This framework leverages the collaboration of multiple large language model (LLM) agents throughout the reinforcement learning (RL) training process.
It automates policy learning across a series of complex driving tasks, and it significantly reduces the reliance on manual reward function design.
arXiv Detail & Related papers (2025-01-09T08:28:16Z) - VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving [1.3107174618549584]
Reinforcement learning (RL)-based methods for learning driving policies have gained increasing attention in the autonomous driving community.
Traditional RL approaches rely on manually engineered rewards, which require extensive human effort and often lack generalizability.
We propose textbfVLM-RL, a unified framework that integrates pre-trained Vision-Language Models (VLMs) with RL to generate reward signals.
arXiv Detail & Related papers (2024-12-20T04:08:11Z) - Human-centric Reward Optimization for Reinforcement Learning-based Automated Driving using Large Language Models [15.11759379703718]
One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively.
This paper introduces an innovative approach that uses large language models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way.
arXiv Detail & Related papers (2024-05-07T09:04:52Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Self-Refined Large Language Model as Automated Reward Function Designer
for Deep Reinforcement Learning in Robotics [14.773498542408264]
Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge.
We propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design.
arXiv Detail & Related papers (2023-09-13T02:56:56Z) - Language Reward Modulation for Pretraining Reinforcement Learning [61.76572261146311]
We propose leveraging the capabilities of LRFs as a pretraining signal for reinforcement learning.
Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks.
arXiv Detail & Related papers (2023-08-23T17:37:51Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - Comprehensive Training and Evaluation on Deep Reinforcement Learning for
Automated Driving in Various Simulated Driving Maneuvers [0.4241054493737716]
This study implements, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO)
Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance.
arXiv Detail & Related papers (2023-06-20T11:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.