Fugu-MT 論文翻訳(概要): Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

論文の概要: Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

arxiv url: http://arxiv.org/abs/2603.13842v1
Date: Sat, 14 Mar 2026 08:53:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.438846
Title: Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving
Title（参考訳）: ファインチューニングは十分ではない: エンドツーエンド自動運転における協調的模倣と強化学習のための並列フレームワーク
Authors: Zhexi Lian, Haoran Wang, Xuerun Yan, Weimeng Lin, Xianhong Zhang, Yongyu Chen, Jia Hu,
Abstract要約: PaIR-Driveは、エンドツーエンドの自動運転における協調および強化学習のための一般的なフレームワークである。トレーニング中、PaIR-DriveはILとRLを2つの並列ブランチに分離する。 PaIR-Driveは既存のRLファインチューニング法を一貫して上回り、人間の専門家の準最適動作を補正する可能性さえある。
参考スコア（独自算出の注目度）: 7.691237575352413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end autonomous driving is typically built upon imitation learning (IL), yet its performance is constrained by the quality of human demonstrations. To overcome this limitation, recent methods incorporate reinforcement learning (RL) through sequential fine-tuning. However, such a paradigm remains suboptimal: sequential RL fine-tuning can introduce policy drift and often leads to a performance ceiling due to its dependence on the pretrained IL policy. To address these issues, we propose PaIR-Drive, a general Parallel framework for collaborative Imitation and Reinforcement learning in end-to-end autonomous driving. During training, PaIR-Drive separates IL and RL into two parallel branches with conflict-free training objectives, enabling fully collaborative optimization. This design eliminates the need to retrain RL when applying a new IL policy. During inference, RL leverages the IL policy to further optimize the final plan, allowing performance beyond prior knowledge of IL. Furthermore, we introduce a tree-structured trajectory neural sampler to group relative policy optimization (GRPO) in the RL branch, which enhances exploration capability. Extensive analysis on NAVSIMv1 and v2 benchmark demonstrates that PaIR-Drive achieves Competitive performance of 91.2 PDMS and 87.9 EPDMS, building upon Transfuser and DiffusionDrive IL baselines. PaIR-Drive consistently outperforms existing RL fine-tuning methods, and could even correct human experts' suboptimal behaviors. Qualitative results further confirm that PaIR-Drive can effectively explore and generate high-quality trajectories.
Abstract（参考訳）: エンドツーエンドの自動運転は通常、模倣学習(IL)に基づいて構築されるが、その性能は人間の実演の品質に制約される。この制限を克服するため、近年の手法は逐次微調整による強化学習(RL)を取り入れている。シーケンシャルなRLファインチューニングは、ポリシードリフトを導入し、事前訓練されたILポリシーに依存しているため、しばしばパフォーマンス天井につながる。これらの課題に対処するため、我々は、エンドツーエンドの自動運転における協調的模倣と強化学習のための一般的なパラレルフレームワークであるPaIR-Driveを提案する。トレーニング中、PaIR-DriveはILとRLをコンフリクトフリーのトレーニング目標を持つ2つの並列ブランチに分離し、完全に協調的な最適化を可能にする。この設計により、新しいILポリシーを適用する際にRLを再訓練する必要がなくなる。推論中、RLはILポリシーを利用して最終計画をさらに最適化し、ILの以前の知識を超えたパフォーマンスを実現する。さらに,RLブランチにおけるグループ相対政策最適化(GRPO)に木構造トラジェクトリニューラルサンプリングを導入し,探索能力を向上させる。 NAVSIMv1とv2ベンチマークの大規模な分析は、PaIR-Driveが91.2 PDMSと87.9 EPDMSの競合性能を達成し、TransfuserとDiffusionDrive ILベースラインを構築していることを示している。 PaIR-Driveは既存のRLファインチューニング法を一貫して上回り、人間の専門家の準最適動作を補正する可能性さえある。質的な結果は、PaIR-Driveが高品質な軌道を効果的に探索し生成できることをさらに確認する。

論文の概要: Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

関連論文リスト