Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving
- URL: http://arxiv.org/abs/2601.01800v1
- Date: Mon, 05 Jan 2026 05:20:16 GMT
- Title: Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving
- Authors: Qi Wei, Junchao Fan, Zhao Yang, Jianhua Wang, Jingkai Mao, Xiaolin Chang,
- Abstract summary: We introduce criticality-aware robust RL (CARRL) for handling sparse, safety-critical risks in autonomous driving.<n>CARRL consists of two interacting components: a risk exposure adversary (REA) and a risk-targeted robust agent (RTRA)<n>We show that our approach reduces the collision rate by at least 22.66% across all cases compared to state-of-the-art baseline methods.
- Score: 11.62520853262219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has shown considerable potential in autonomous driving (AD), yet its vulnerability to perturbations remains a critical barrier to real-world deployment. As a primary countermeasure, adversarial training improves policy robustness by training the AD agent in the presence of an adversary that deliberately introduces perturbations. Existing approaches typically model the interaction as a zero-sum game with continuous attacks. However, such designs overlook the inherent asymmetry between the agent and the adversary and then fail to reflect the sparsity of safety-critical risks, rendering the achieved robustness inadequate for practical AD scenarios. To address these limitations, we introduce criticality-aware robust RL (CARRL), a novel adversarial training approach for handling sparse, safety-critical risks in autonomous driving. CARRL consists of two interacting components: a risk exposure adversary (REA) and a risk-targeted robust agent (RTRA). We model the interaction between the REA and RTRA as a general-sum game, allowing the REA to focus on exposing safety-critical failures (e.g., collisions) while the RTRA learns to balance safety with driving efficiency. The REA employs a decoupled optimization mechanism to better identify and exploit sparse safety-critical moments under a constrained budget. However, such focused attacks inevitably result in a scarcity of adversarial data. The RTRA copes with this scarcity by jointly leveraging benign and adversarial experiences via a dual replay buffer and enforces policy consistency under perturbations to stabilize behavior. Experimental results demonstrate that our approach reduces the collision rate by at least 22.66\% across all cases compared to state-of-the-art baseline methods.
Related papers
- Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models [62.16655896700062]
Activation steering is a technique to enhance the utility of Large Language Models (LLMs)<n>We show that it unintentionally introduces critical and under-explored safety risks.<n>Experiments reveal that these interventions act as a force multiplier, creating new vulnerabilities to jailbreaks and increasing attack success rates to over 80% on standard benchmarks.
arXiv Detail & Related papers (2026-02-03T12:32:35Z) - Self-Guard: Defending Large Reasoning Models via enhanced self-reflection [54.775612141528164]
Self-Guard is a lightweight safety defense framework for Large Reasoning Models.<n>It bridges the awareness-compliance gap, achieving robust safety performance without compromising model utility.<n>Self-Guard exhibits strong generalization across diverse unseen risks and varying model scales.
arXiv Detail & Related papers (2026-01-31T13:06:11Z) - SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment [43.86865924673546]
We propose SafeThinker, an adaptive framework that allocates defensive resources via a lightweight gateway classifier.<n> Experiments show that SafeThinker significantly lowers attack success rates across diverse jailbreak strategies without compromising robustness.
arXiv Detail & Related papers (2026-01-23T07:12:53Z) - UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning [15.028168889991795]
We propose a novel approach, Uncertainty-Aware Critic Ensemble for robust adversarial Reinforcement learning (UACER)<n>In this paper, we propose a novel approach, Uncertainty-Aware Critic Ensemble for robust adversarial Reinforcement learning (UACER)
arXiv Detail & Related papers (2025-12-11T10:14:13Z) - Controllable risk scenario generation from human crash data for autonomous vehicle testing [13.3074428571403]
Controllable Risk Agent Generation (CRAG) is a framework designed to unify the modeling of dominant nominal behaviors and rare safety-critical behaviors.<n>CRAG constructs a structured latent space that disentangles normal and risk-related behaviors, enabling efficient use of limited crash data.
arXiv Detail & Related papers (2025-11-27T04:53:18Z) - Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach [56.34189898996741]
We propose a novel robust autonomous driving approach that consists of a strategic targeted adversary and a robust driving agent.<n>IGCARL improves the success rate by at least 27.9% over state-of-the-art methods, demonstrating superior robustness to adversarial attacks.
arXiv Detail & Related papers (2025-10-10T06:21:36Z) - RADE: Learning Risk-Adjustable Driving Environment via Multi-Agent Conditional Diffusion [17.46462636610847]
Risk- Driving Environment (RADE) is a simulation framework that generates statistically realistic and risk-adjustable traffic scenes.<n>RADE learns risk-conditioned behaviors directly from data, preserving naturalistic multi-agent interactions with controllable risk levels.<n>We validate RADE on the real-world rounD dataset, demonstrating that it preserves statistical realism across varying risk levels.
arXiv Detail & Related papers (2025-05-06T04:41:20Z) - On Minimizing Adversarial Counterfactual Error in Adversarial RL [18.044879441434432]
adversarial noise poses significant risks in safety-critical scenarios.<n>We introduce a novel objective called Adversarial Counterfactual Error (ACoE)<n>Our method significantly outperforms current state-of-the-art approaches for addressing adversarial RL challenges.
arXiv Detail & Related papers (2024-06-07T08:14:24Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Seeing is not Believing: Robust Reinforcement Learning against Spurious
Correlation [57.351098530477124]
We consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders.
A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one.
Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge.
arXiv Detail & Related papers (2023-07-15T23:53:37Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - SAAC: Safe Reinforcement Learning as an Adversarial Game of
Actor-Critics [11.132587007566329]
We develop a safe adversarially guided soft actor-critic framework called SAAC.
In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function.
We show that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints.
arXiv Detail & Related papers (2022-04-20T12:32:33Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.