Fugu-MT 論文翻訳(概要): From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving

論文の概要: From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving

arxiv url: http://arxiv.org/abs/2606.14032v1
Date: Fri, 12 Jun 2026 02:13:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 16:00:42.712026
Title: From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving
Title（参考訳）: 攻撃からカリキュラムへ: 安全な自律運転のための学習性指導型対人訓練
Authors: Yuewen Mei, Tong Nie, Jie Sun, Haotian Shi, Wei Ma, Jian Sun,
Abstract要約: AlignADVは学習性誘導型クローズドループ対向トレーニングフレームワークである。敵のシナリオを解決可能で能力に整合したカリキュラムに変換する。実験の結果、最大40.6%のトレーニングステップが短縮された。
参考スコア（独自算出の注目度）: 56.30087557121323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Closed-loop adversarial training improves autonomous driving safety by exposing policies to rare safety-critical scenarios. Standard pipelines first generate adversarial scenarios and then sample them for policy optimization. However, most existing frameworks remain attack-oriented: collision-driven generators often synthesize unsolvable extreme situations, which can degrade learning, while heuristic samplers ignore the evolving capability of the driving policy, causing sample inefficiency and delayed convergence. We propose AlignADV, a learnability-guided closed-loop adversarial training framework that converts adversarial scenarios into resolvable and capability-aligned curricula. First, we reformulate adversarial scenario generation as a preference alignment problem and employ direct preference optimization to guide the generator toward critical yet resolvable scenarios. Second, we introduce behavioral fingerprints to capture the intrinsic characteristics of the evolving policy and construct a multi-modal capability prediction model that estimates policy performance without expensive closed-loop simulations. By combining resolvability-aligned scenarios with capability predictions, AlignADV develops a dynamic curriculum sampling mechanism that prioritizes scenarios targeting the current policy's vulnerabilities. Experiments on the Waymo Open Motion Dataset demonstrate that AlignADV improves convergence efficiency and final performance, reducing training steps by up to 40.6 percent compared with baseline methods while lowering collision rate and improving route completion under both normal and adversarial traffic conditions. These results highlight a shift from attack-oriented scenario generation to learnability-guided policy improvement, offering a principled direction for safer and more efficient autonomous driving training. Project page: https://meiyuewen.github.io/AlignADV/.
Abstract（参考訳）: 閉ループ対向訓練は、まれな安全クリティカルなシナリオにポリシーを公開することにより、自律運転の安全性を向上させる。標準パイプラインはまず逆シナリオを生成し、次にポリシー最適化のためにそれらをサンプルする。しかし、既存のほとんどのフレームワークは攻撃指向であり、衝突駆動ジェネレータは、しばしば解けない極端な状況を合成し、学習を劣化させるが、ヒューリスティックサンプリングは、駆動ポリシーの進化する能力を無視し、サンプルの非効率性と遅延収束を引き起こす。本稿では,AlignADVを提案する。AlignADVは学習性に配慮した閉ループ逆行学習フレームワークで,逆行シナリオを可解かつ能力に整合したカリキュラムに変換する。まず, 選択調整問題として逆シナリオ生成を再構成し, 直接選好最適化を用いて, 決定的かつ解決可能なシナリオに向けてジェネレータを誘導する。第二に、進化する政策の本質的な特徴を捉えるために行動指紋を導入し、高価なクローズドループシミュレーションを使わずに政策性能を推定するマルチモーダル能力予測モデルを構築した。解決可能性に整合したシナリオと能力予測を組み合わせることで、AlignADVは、現在のポリシーの脆弱性をターゲットにしたシナリオを優先順位付けする動的カリキュラムサンプリングメカニズムを開発する。 Waymo Open Motion Datasetの実験では、AlignADVはコンバージェンス効率と最終性能を改善し、ベースライン法と比較してトレーニング手順を最大40.6%削減し、衝突速度を低下させ、通常の交通条件と対向交通条件の両方でルート完了を改善する。これらの結果は、より安全で効率的な自動運転訓練のための原則化された方向性を提供する、攻撃指向シナリオ生成から学習可能性誘導型ポリシー改善への移行を浮き彫りにしている。プロジェクトページ:https://meiyuewen.github.io/AlignADV/。

論文の概要: From Attacks to Curricula: Learnability-Guided Adversarial Training for Safe Autonomous Driving

関連論文リスト