Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
- URL: http://arxiv.org/abs/2406.04481v1
- Date: Thu, 6 Jun 2024 20:10:34 GMT
- Title: Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
- Authors: Yuan Sun, Navid Salami Pargoo, Peter J. Jin, Jorge Ortiz,
- Abstract summary: Reinforcement Learning from Human Feedback (RLHF) is popular in large language models (LLMs)
RLHF is usually applied in the fine-tuning step, requiring direct human "preferences"
We will validate our model using data gathered from real-life testbeds located in New Jersey and New York City.
- Score: 2.499371729440073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning from Human Feedback (RLHF) is popular in large language models (LLMs), whereas traditional Reinforcement Learning (RL) often falls short. Current autonomous driving methods typically utilize either human feedback in machine learning, including RL, or LLMs. Most feedback guides the car agent's learning process (e.g., controlling the car). RLHF is usually applied in the fine-tuning step, requiring direct human "preferences," which are not commonly used in optimizing autonomous driving models. In this research, we innovatively combine RLHF and LLMs to enhance autonomous driving safety. Training a model with human guidance from scratch is inefficient. Our framework starts with a pre-trained autonomous car agent model and implements multiple human-controlled agents, such as cars and pedestrians, to simulate real-life road environments. The autonomous car model is not directly controlled by humans. We integrate both physical and physiological feedback to fine-tune the model, optimizing this process using LLMs. This multi-agent interactive environment ensures safe, realistic interactions before real-world application. Finally, we will validate our model using data gathered from real-life testbeds located in New Jersey and New York City.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Sparse Rewards Can Self-Train Dialogue Agents [22.799506097310008]
We introduce a novel self-improvement paradigm that empowers LLM agents to autonomously enhance their performance without external human feedback.
We present ToolWOZ, a sparse reward tool-calling simulation environment derived from MultiWOZ.
We demonstrate that models trained with JOSH, both small and frontier, significantly improve tool-based interactions while preserving general model capabilities across diverse benchmarks.
arXiv Detail & Related papers (2024-09-06T21:00:57Z) - MetaFollower: Adaptable Personalized Autonomous Car Following [63.90050686330677]
We propose an adaptable personalized car-following framework - MetaFollower.
We first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events.
We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability.
arXiv Detail & Related papers (2024-06-23T15:30:40Z) - SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation [54.97931304488993]
Self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems.
We propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies.
We report improvements in terms of collision avoidance, as well as more socially compliant behavior, measured by a human user study.
arXiv Detail & Related papers (2024-03-01T21:27:03Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving [2.807187711407621]
We propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework.
We first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM)
In this paradigm, the human expert serves as a mentor to the AI agent, while the agent could be guided to minimize traffic flow disturbance.
arXiv Detail & Related papers (2024-01-06T08:30:14Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - FastRLAP: A System for Learning High-Speed Driving via Deep RL and
Autonomous Practicing [71.76084256567599]
We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL)
Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations.
The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.
arXiv Detail & Related papers (2023-04-19T17:33:47Z) - Incorporating Voice Instructions in Model-Based Reinforcement Learning
for Self-Driving Cars [12.716258111815312]
We propose incorporating natural language voice instructions (NLI) in model-based deep reinforcement learning to train self-driving cars.
The results show that NLI can help ease the training process and significantly boost the agents' learning speed.
arXiv Detail & Related papers (2022-06-21T10:55:39Z) - DDPG car-following model with real-world human driving experience in
CARLA [0.0]
We propose a two-stage Deep Reinforcement Learning (DRL) method, that learns from real-world human driving to achieve performance that is superior to the pure DRL agent.
For evaluation, we designed different real-world driving scenarios to compare the proposed two-stage DRL agent with the pure DRL agent.
arXiv Detail & Related papers (2021-12-29T15:22:31Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.