Fugu-MT 論文翻訳(概要): On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

論文の概要: On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

arxiv url: http://arxiv.org/abs/2604.07944v1
Date: Thu, 09 Apr 2026 08:06:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.788219
Title: On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning
Title（参考訳）: 自律走行計画のための言語モデルのオンライン蒸留
Authors: Amirhossein Afsharrad, Amirhesam Abedsoltan, Ahmadreza Moradipari, Sanjay Lall,
Abstract要約: 大型言語モデル(LLM)は、最近、自動運転車の運動計画に強い可能性を実証している。本研究では,大規模LLMからより小さく,より展開可能な学生モデルへ,運動計画の知識を効果的に伝達する方法について検討する。
参考スコア（独自算出の注目度）: 3.2748787252933442
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have recently demonstrated strong potential for autonomous vehicle motion planning by reformulating trajectory prediction as a language generation problem. However, deploying capable LLMs in resource-constrained onboard systems remains a fundamental challenge. In this paper, we study how to effectively transfer motion planning knowledge from a large teacher LLM to a smaller, more deployable student model. We build on the GPT-Driver framework, which represents driving scenes as language prompts and generates waypoint trajectories with chain-of-thought reasoning, and investigate two student training paradigms: (i) on-policy generalized knowledge distillation (GKD), which trains the student on its own self-generated outputs using dense token-level feedback from the teacher, and (ii) a dense-feedback reinforcement learning (RL) baseline that uses the teacher's log-probabilities as per-token reward signals in a policy gradient framework. Experiments on the nuScenes benchmark show that GKD substantially outperforms the RL baseline and closely approaches teacher-level performance despite a 5$\times$ reduction in model size. These results highlight the practical value of on-policy distillation as a principled and effective approach to deploying LLM-based planners in autonomous driving systems.
Abstract（参考訳）: 大規模言語モデル (LLM) は, 軌跡予測を言語生成問題として再構成することにより, 自律走行車両の運動計画に強い可能性を示した。しかしながら、リソース制約のあるオンボードシステムに有能なLLMをデプロイすることは、依然として根本的な課題である。本稿では,大規模教員のLLMからより小さく,より展開可能な学生モデルへ,運動計画の知識を効果的に伝達する方法について検討する。 GPT-Driverフレームワーク上に構築されており、運転シーンを言語プロンプトとして表現し、チェーンオブ思考推論によるウェイポイント軌跡を生成する。 (i)教師からの密集したトークンレベルのフィードバックを用いて学生に自己生成出力を訓練するオンライン総合知識蒸留(GKD) (II)政策グラデーションの枠組みにおいて,教師の対数確率を報酬信号として用いる高密度フィードバック強化学習(RL)ベースライン。 nuScenesベンチマークの実験では、GKDは5$\times$のモデルサイズ削減にもかかわらず、RLベースラインを大幅に上回り、教師レベルのパフォーマンスに近づいた。これらの結果は、自律運転システムにLCMベースのプランナーを配置するための原則的かつ効果的なアプローチとして、オンデマンド蒸留の実践的価値を強調している。

論文の概要: On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

関連論文リスト