In-context Learning for Automated Driving Scenarios
- URL: http://arxiv.org/abs/2405.04135v1
- Date: Tue, 7 May 2024 09:04:52 GMT
- Title: In-context Learning for Automated Driving Scenarios
- Authors: Ziqi Zhou, Jingyue Zhang, Jingyuan Zhang, Boyue Wang, Tianyu Shi, Alaa Khamis,
- Abstract summary: One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively.
This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way.
- Score: 15.325910109153616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic environment descriptions are input into the LLM. The LLM then utilizes this information to assist in generating rewards, thereby steering the behavior of RL agents towards patterns that more closely resemble human driving. The experimental results demonstrate that this approach not only makes RL agents more anthropomorphic but also reaches better performance. Additionally, various strategies for reward-proxy and reward-shaping are investigated, revealing the significant impact of prompt design on shaping an AD vehicle's behavior. These findings offer a promising direction for the development of more advanced and human-like automated driving systems. Our experimental data and source code can be found here.
Related papers
- Automating Traffic Model Enhancement with AI Research Agent [4.420199777075044]
Traffic Research Agent (TR-Agent) is an AI-driven system designed to autonomously develop and refine traffic models.
TR-Agent achieves significant performance improvements across multiple traffic models.
To further support research and collaboration, we have open-sourced both the code and data used in our experiments.
arXiv Detail & Related papers (2024-09-25T12:42:25Z) - Generating and Evolving Reward Functions for Highway Driving with Large Language Models [18.464822261908562]
Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies.
We introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving.
arXiv Detail & Related papers (2024-06-15T07:50:10Z) - REvolve: Reward Evolution with Large Language Models using Human Feedback [6.4550546442058225]
Large language models (LLMs) have been used for reward generation from natural language task descriptions.
LLMs, guided by human feedback, can be used to formulate reward functions that reflect human implicit knowledge.
We introduce REvolve, a truly evolutionary framework that uses LLMs for reward design in reinforcement learning.
arXiv Detail & Related papers (2024-06-03T13:23:27Z) - Ego-Foresight: Agent Visuomotor Prediction as Regularization for RL [34.6883445484835]
Ego-Foresight is a self-supervised method for disentangling agent and environment based on motion and prediction.
We show that visuomotor prediction of the agent provides regularization to the RL algorithm, by encouraging the actions to stay within predictable bounds.
We integrate Ego-Foresight with a model-free RL algorithm to solve simulated robotic manipulation tasks, showing an average improvement of 23% in efficiency and 8% in performance.
arXiv Detail & Related papers (2024-05-27T13:32:43Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving [2.807187711407621]
We propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework.
We first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM)
In this paradigm, the human expert serves as a mentor to the AI agent, while the agent could be guided to minimize traffic flow disturbance.
arXiv Detail & Related papers (2024-01-06T08:30:14Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.