Related papers: Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

URL: http://arxiv.org/abs/2509.20102v1
Date: Wed, 24 Sep 2025 13:27:35 GMT
Title: Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Authors: Tong Nie, Yuewen Mei, Yihong Tang, Junlin He, Jie Sun, Haotian Shi, Wei Ma, Jian Sun,
Abstract summary: Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems.<n>We introduce a new framework named textbfSteerable textbfAdversarial scenario textbfGEnerator (SAGE)<n>SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining.
Score: 58.37104890690234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named \textbf{S}teerable \textbf{A}dversarial scenario \textbf{GE}nerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies. Project page: https://tongnie.github.io/SAGE/.

Related papers

Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z)
MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z)
Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning [52.03884701766989]
offline reinforcement learning (RL) algorithms typically impose constraints on action selection.<n>We propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions.<n>We develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint.
arXiv Detail & Related papers (2025-11-04T13:42:05Z)
DRIVE: Dynamic Rule Inference and Verified Evaluation for Constraint-Aware Autonomous Driving [37.24058519921229]
We present DRIVE, a novel framework for Dynamic Rule Inference and Verified Evaluation.<n>D DRIVE achieves 0.0% soft constraint violation rates, smoother trajectories, and stronger generalization across diverse driving scenarios.<n> Verified evaluations further demonstrate the efficiency, explanability, and robustness of the framework for real-world deployment.
arXiv Detail & Related papers (2025-08-06T03:56:06Z)
Conformal Arbitrage: Risk-Controlled Balancing of Competing Objectives in Language Models [5.294604210205507]
We introduce Conformal Arbitrage, a framework that learns a data driven threshold to mediate between a Primary model optimized for a primary objective and a more conservative Guardian.<n>We observe that our method outperforms, in terms of accuracy, cost matched random routing between models.
arXiv Detail & Related papers (2025-06-01T08:55:10Z)
From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving [29.36624509719055]
We propose textbfSERA, a framework that enables autonomous driving systems to self-evolve by repairing failure cases through targeted scenario recommendation.<n>By analyzing performance logs, SERA identifies failure patterns and dynamically retrieves semantically aligned scenarios from a structured bank.<n>Experiments on the benchmark show that SERA consistently improves key metrics across multiple autonomous driving baselines, demonstrating its effectiveness and generalizability under safety-critical conditions.
arXiv Detail & Related papers (2025-05-28T07:46:19Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv Detail & Related papers (2024-05-29T22:12:52Z)
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework. Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations. We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z)
An Adaptive Fuzzy Reinforcement Learning Cooperative Approach for the Autonomous Control of Flock Systems [4.961066282705832]
This work introduces an adaptive distributed robustness technique for the autonomous control of flock systems. Its relatively flexible structure is based on online fuzzy reinforcement learning schemes which simultaneously target a number of objectives. In addition to its resilience in the face of dynamic disturbances, the algorithm does not require more than the agent position as a feedback signal.
arXiv Detail & Related papers (2023-03-17T13:07:35Z)
A Unified Framework for Adversarial Attack and Defense in Constrained Feature Space [13.096022606256973]
We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework forms the starting point for research on constrained adversarial attacks and provides relevant baselines and datasets that research can exploit.
arXiv Detail & Related papers (2021-12-02T12:05:27Z)
Congestion-aware Multi-agent Trajectory Prediction for Collision Avoidance [110.63037190641414]
We propose to learn congestion patterns explicitly and devise a novel "Sense--Learn--Reason--Predict" framework. By decomposing the learning phases into two stages, a "student" can learn contextual cues from a "teacher" while generating collision-free trajectories. In experiments, we demonstrate that the proposed model is able to generate collision-free trajectory predictions in a synthetic dataset.
arXiv Detail & Related papers (2021-03-26T02:42:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.