Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing
- URL: http://arxiv.org/abs/2504.08704v1
- Date: Fri, 11 Apr 2025 17:11:21 GMT
- Title: Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing
- Authors: Vinal Asodia, Zhenhua Feng, Saber Fallah,
- Abstract summary: This paper presents a novel pipeline for generating human-aligned reward labels.<n>The pipeline incorporates an adaptive safety component, activated by analyzing semantic segmentation maps.<n>The results indicate that the generated reward labels closely match the simulation reward labels.
- Score: 13.342097008372479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective leveraging of real-world driving datasets is crucial for enhancing the training of autonomous driving systems. While Offline Reinforcement Learning enables the training of autonomous vehicles using such data, most available datasets lack meaningful reward labels. Reward labeling is essential as it provides feedback for the learning algorithm to distinguish between desirable and undesirable behaviors, thereby improving policy performance. This paper presents a novel pipeline for generating human-aligned reward labels. The proposed approach addresses the challenge of absent reward signals in real-world datasets by generating labels that reflect human judgment and safety considerations. The pipeline incorporates an adaptive safety component, activated by analyzing semantic segmentation maps, allowing the autonomous vehicle to prioritize safety over efficiency in potential collision scenarios. The proposed pipeline is applied to an occluded pedestrian crossing scenario with varying levels of pedestrian traffic, using synthetic and simulation data. The results indicate that the generated reward labels closely match the simulation reward labels. When used to train the driving policy using Behavior Proximal Policy Optimisation, the results are competitive with other baselines. This demonstrates the effectiveness of our method in producing reliable and human-aligned reward signals, facilitating the training of autonomous driving systems through Reinforcement Learning outside of simulation environments and in alignment with human values.
Related papers
- Enhancing Autonomous Driving Safety with Collision Scenario Integration [36.83682052117178]
We propose SafeFusion, a training framework to learn from collision data.<n>Instead of over-relying on imitation learning, SafeFusion integrates safety-oriented metrics during training to enable collision avoidance learning.<n>We also propose CollisionGen, a scalable data generation pipeline to generate diverse, high-quality scenarios.
arXiv Detail & Related papers (2025-03-05T23:08:43Z) - Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations [48.924085579865334]
Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices.
This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets.
arXiv Detail & Related papers (2024-11-04T09:21:00Z) - DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving [81.04174379726251]
This paper collects a comprehensive end-to-end driving dataset named DriveCoT.
It contains sensor data, control decisions, and chain-of-thought labels to indicate the reasoning process.
We propose a baseline model called DriveCoT-Agent, trained on our dataset, to generate chain-of-thought predictions and final decisions.
arXiv Detail & Related papers (2024-03-25T17:59:01Z) - FedDriveScore: Federated Scoring Driving Behavior with a Mixture of
Metric Distributions [6.195950768412144]
Vehicle-cloud collaboration is proposed as a privacy-friendly alternative to centralized learning.
This framework includes a consistently federated version of the scoring method to reduce the performance degradation of the global scoring model.
arXiv Detail & Related papers (2024-01-13T02:15:41Z) - Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning
in Surgical Robotic Environments [4.2569494803130565]
We introduce an innovative algorithm designed to assign rewards to offline trajectories, using a small number of high-quality expert demonstrations.
This approach circumvents the need for handcrafted rewards, unlocking the potential to harness vast datasets for policy learning.
arXiv Detail & Related papers (2023-10-13T03:39:15Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Exploring the trade off between human driving imitation and safety for
traffic simulation [0.34410212782758043]
We show that a trade-off exists between imitating human driving and maintaining safety when learning driving policies.
We propose a multi objective learning algorithm (MOPPO) that improves both objectives together.
arXiv Detail & Related papers (2022-08-09T14:30:19Z) - Tackling Real-World Autonomous Driving using Deep Reinforcement Learning [63.3756530844707]
In this work, we propose a model-free Deep Reinforcement Learning Planner training a neural network that predicts acceleration and steering angle.
In order to deploy the system on board the real self-driving car, we also develop a module represented by a tiny neural network.
arXiv Detail & Related papers (2022-07-05T16:33:20Z) - Learning Interactive Driving Policies via Data-driven Simulation [125.97811179463542]
Data-driven simulators promise high data-efficiency for driving policy learning.
Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving.
We propose a simulation method that uses in-painted ado vehicles for learning robust driving policies.
arXiv Detail & Related papers (2021-11-23T20:14:02Z) - Deep Structured Reactive Planning [94.92994828905984]
We propose a novel data-driven, reactive planning objective for self-driving vehicles.
We show that our model outperforms a non-reactive variant in successfully completing highly complex maneuvers.
arXiv Detail & Related papers (2021-01-18T01:43:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.