Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles
- URL: http://arxiv.org/abs/2502.15792v1
- Date: Tue, 18 Feb 2025 16:32:31 GMT
- Title: Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles
- Authors: Jiahui Wu, Chengjie Lu, Aitor Arrieta, Shaukat Ali,
- Abstract summary: We propose MOEQT, a novel multi-objective reinforcement learning (MORL)-based approach to generate critical scenarios that simultaneously test interdependent safety and functional requirements.<n>We evaluate MOEQT using an advanced end-to-end AV controller and a high-fidelity simulator.<n>Our evaluation results show that MOEQT achieved an overall better performance in identifying critical scenarios for violating multiple requirements than the baselines.
- Score: 11.194377118552024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous vehicles (AVs) make driving decisions without human intervention. Therefore, ensuring AVs' dependability is critical. Despite significant research and development in AV development, their dependability assurance remains a significant challenge due to the complexity and unpredictability of their operating environments. Scenario-based testing evaluates AVs under various driving scenarios, but the unlimited number of potential scenarios highlights the importance of identifying critical scenarios that can violate safety or functional requirements. Such requirements are inherently interdependent and need to be tested simultaneously. To this end, we propose MOEQT, a novel multi-objective reinforcement learning (MORL)-based approach to generate critical scenarios that simultaneously test interdependent safety and functional requirements. MOEQT adapts Envelope Q-learning as the MORL algorithm, which dynamically adapts multi-objective weights to balance the relative importance between multiple objectives. MOEQT generates critical scenarios to violate multiple requirements through dynamically interacting with the AV environment, ensuring comprehensive AV testing. We evaluate MOEQT using an advanced end-to-end AV controller and a high-fidelity simulator and compare MOEQT with two baselines: a random strategy and a single-objective RL with a weighted reward function. Our evaluation results show that MOEQT achieved an overall better performance in identifying critical scenarios for violating multiple requirements than the baselines.
Related papers
- DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding [76.3876070043663]
We propose DriveLMM-o1, a dataset and benchmark designed to advance step-wise visual reasoning for autonomous driving.
Our benchmark features over 18k VQA examples in the training set and more than 4k in the test set, covering diverse questions on perception, prediction, and planning.
Our model achieves a +7.49% gain in final answer accuracy, along with a 3.62% improvement in reasoning score over the previous best open-source model.
arXiv Detail & Related papers (2025-03-13T17:59:01Z) - CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models [1.6612510324510592]
CurricuVLM is a novel framework that enables personalized curriculum learning for autonomous driving agents.
Our approach exploits Vision-Language Models (VLMs) to analyze agent behavior, identify performance weaknesses, and dynamically generate tailored training scenarios.
CurricuVLM outperforms state-of-the-art baselines across both regular and safety-critical scenarios.
arXiv Detail & Related papers (2025-02-21T00:42:40Z) - Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention.<n>We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z) - Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)<n>We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.<n>We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z) - Generating Critical Scenarios for Testing Automated Driving Systems [5.975915967339764]
AVASTRA is a Reinforcement Learning-based approach to generate realistic critical scenarios for testing Autonomous Driving System.
Results show AVASTRA's ability to outperform the state-of-the-art approach by generating 30% to 115% more collision scenarios.
arXiv Detail & Related papers (2024-12-03T16:59:30Z) - AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit [59.10281630985958]
In question answering (QA), different questions can be effectively addressed with different answering strategies.
We develop a dynamic method that adaptively selects the most suitable QA strategy for each question.
Our experiments show that the proposed solution is viable for adaptive orchestration of a QA system with multiple modules.
arXiv Detail & Related papers (2024-09-20T12:28:18Z) - Stackelberg Driver Model for Continual Policy Improvement in
Scenario-Based Closed-Loop Autonomous Driving [5.765939495779461]
adversarial generation methods have emerged as a class of efficient approaches to synthesize safety-critical scenarios.
We tailor the Stackelberg Driver Model (SDM) to accurately characterize the hierarchical nature of vehicle interaction dynamics.
Our algorithm exhibits superior performance compared to several baselines especially in higher dimensional scenarios.
arXiv Detail & Related papers (2023-09-25T15:47:07Z) - Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning [48.667697255912614]
Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents.
We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions.
Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
arXiv Detail & Related papers (2023-06-29T15:57:07Z) - Evolving Testing Scenario Generation Method and Intelligence Evaluation
Framework for Automated Vehicles [12.670180834651912]
This paper proposes an evolving scenario generation method that utilizes deep reinforcement learning (DRL) to create human-like BVs for testing and intelligence evaluation of automated vehicles (AVs)
The results demonstrate that the proposed evolving scenario exhibits the highest level of complexity compared to other baseline scenarios and has more than 85% similarity to naturalistic driving data.
arXiv Detail & Related papers (2023-06-12T14:26:12Z) - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function.
Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied.
Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.