Fugu-MT 論文翻訳(概要): Scaling Self-Play for End-to-End Driving

論文の概要: Scaling Self-Play for End-to-End Driving

arxiv url: http://arxiv.org/abs/2606.19641v2
Date: Fri, 19 Jun 2026 17:53:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 16:10:14.864989
Title: Scaling Self-Play for End-to-End Driving
Title（参考訳）: エンド・ツー・エンド運転におけるセルフプレイのスケーリング
Authors: Luke Rowe, Roger Girgis, Rodrigue de Schaetzen, Daphne Cornelisse, Alaap Grandhi, Felix Heide, Eugene Vinitsky, Christopher Pal, Liam Paull,
Abstract要約: Gigapixelは、視点レンダリングを備えた高スループットバッチ駆動シミュレータである。我々は、特権的なRL教師からのオンライン蒸留を通じて、自己再生における画素ベースの政策を訓練する。我々は、自己学習されたポリシーを、軽量な知覚適応を通して現実世界のセンサーデータに転送する。
参考スコア（独自算出の注目度）: 40.15606566638922
License: http://creativecommons.org/licenses/by/4.0/
Abstract: End-to-end autonomous driving models are typically trained on offline human-demonstration datasets that provide limited state coverage and often no closed-loop feedback, making them prone to compounding errors when deployed in closed-loop and brittle to long-tail agent interactions. To overcome these limitations, we propose an alternative strategy for training end-to-end driving models: large-scale self-play directly from pixels in simulation. While prior self-play approaches have shown promising transfer to real-world driving, they typically assume vectorized Bird's-Eye-View (BEV) observations that are incompatible with end-to-end policies operating directly on sensor observations. To this end, we introduce Gigapixel, a high-throughput batched driving simulator with perspective rendering, enabling scalable self-play directly from pixel observations. Rather than targeting compute-costly photorealistic sensor simulation, Gigapixel renders a simplified bounding-box world that preserves essential scene structure while achieving throughput at 50k agent steps per second. Since direct pixel-space self-play RL is prohibitively sample-inefficient at end-to-end model scale, we propose self-play DAgger training: we train pixel-based policies in self-play via on-policy distillation from a privileged RL teacher. To bridge the sim-to-real gap, we subsequently transfer the self-play trained policies to real-world sensor data through lightweight perception adaptation. Policies trained in Gigapixel and adapted to real-world sensor data achieve competitive performance on the HUGSIM and NAVSIM-v2 benchmarks without human trajectory supervision. Moreover, scaling self-play training yields proportional gains in policy performance, establishing self-play as a practical and scalable strategy for training end-to-end models.
Abstract（参考訳）: エンド・ツー・エンドの自律運転モデルは、通常、オフラインのヒューマン・デモストレーションデータセットでトレーニングされ、状態カバレッジが制限され、クローズドループのフィードバックがないことが多いため、クローズドループにデプロイされた時にエラーを複雑にし、ロングテールエージェントのインタラクションが不安定になる。これらの制約を克服するために,シミュレーションにおいて画素から直接大規模自己再生を行うエンド・ツー・エンドの運転モデルを訓練するための代替戦略を提案する。以前のセルフプレイアプローチは現実の運転に有望な移行を示してきたが、通常はベクトル化されたバードズ・アイビュー(BEV)の観察を前提としており、センサーの観察を直接操作するエンドツーエンドのポリシーとは相容れない。この目的のために,高スループットバッチ駆動シミュレータであるGigapixelを導入し,画素観察から直接,スケーラブルなセルフプレイを実現する。 Gigapixelは計算コストのかかるフォトリアリスティックセンサーシミュレーションをターゲットとするのではなく、単純なバウンディングボックスの世界でシーン構造を保存し、毎秒50kのエージェントステップでスループットを達成している。直接的な画素空間の自己再生RLは、エンドツーエンドのモデルスケールでは非効率にサンプル非効率であるため、我々は、特権的なRL教師からのオンライン蒸留を通じて、自己再生における画素ベースのポリシーを訓練する自己再生DAggerトレーニングを提案する。シミュレーションと現実のギャップを埋めるために、我々はその後、ライトウェイトな知覚適応を通して、自己再生訓練されたポリシーを実世界のセンサーデータに転送する。 Gigapixelで訓練され、現実世界のセンサーデータに適応したポリシーは、人間の軌道監視なしでHUGSIMとNAVSIM-v2ベンチマークで競合性能を達成する。さらに、セルフプレイトレーニングのスケールはポリシーのパフォーマンスに比例して向上し、エンドツーエンドモデルをトレーニングするための実用的でスケーラブルな戦略としてセルフプレイを確立する。

論文の概要: Scaling Self-Play for End-to-End Driving

関連論文リスト