Fugu-MT 論文翻訳(概要): CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

論文の概要: CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

arxiv url: http://arxiv.org/abs/2603.15771v1
Date: Mon, 16 Mar 2026 18:03:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:06.930671
Title: CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving
Title（参考訳）: 補正プランナー:自動運転における強化学習による自己補正プランナー
Authors: Yihong Guo, Dongqiangzi Ye, Sijia Chen, Anqi Liu, Xianming Liu,
Abstract要約: CorrectionPlannerは自己補正を備えた自動回帰プランナーである。 Waymaxでは衝突率を20%以上削減し、nuPlanでは最先端の計画スコアを達成している。
参考スコア（独自算出の注目度）: 55.88697462014118
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous driving requires safe planning, but most learning-based planners lack explicit self-correction ability: once an unsafe action is proposed, there is no mechanism to correct it. Thus, we propose CorrectionPlanner, an autoregressive planner with self-correction that models planning as motion-token generation within a propose, evaluate, and correct loop. At each planning step, the policy proposes an action, namely a motion token, and a learned collision critic predicts whether it will induce a collision within a short horizon. If the critic predicts a collision, we retain the sequence of historical unsafe motion tokens as a self-correction trace, generate the next motion token conditioned on it, and repeat this process until a safe motion token is proposed or the safety criterion is met. This self-correction trace, consisting of all unsafe motion tokens, represents the planner's correction process in motion-token space, analogous to a reasoning trace in language models. We train the planner with imitation learning followed by model-based reinforcement learning using rollouts from a pretrained world model that realistically models agents' reactive behaviors. Closed-loop evaluations show that CorrectionPlanner reduces collision rate by over 20% on Waymax and achieves state-of-the-art planning scores on nuPlan.
Abstract（参考訳）: 自動運転は安全な計画を必要とするが、ほとんどの学習ベースのプランナーは明示的な自己訂正能力を持っていない。そこで我々は,自己補正型自己回帰プランナであるCorrectionPlannerを提案する。それぞれの計画段階において、政策は動きトークンと呼ばれるアクションを提案し、学習された衝突評論家は短い地平線内で衝突を引き起こすかどうかを予測する。批評家が衝突を予測した場合、過去の安全でない動きトークンのシーケンスを自己補正トレースとして保持し、その上に条件付き次の動きトークンを生成し、安全な動きトークンが提案されるか、安全基準が満たされるまでこのプロセスを繰り返す。この自己補正トレースは、すべての安全でないモーショントークンから構成されており、言語モデルにおける推論トレースに類似した、運動トーケン空間におけるプランナーの補正プロセスを表している。我々は,エージェントの反応挙動を現実的にモデル化する事前訓練された世界モデルからのロールアウトを用いて,モデルに基づく強化学習を行い,模倣学習でプランナーを訓練する。閉ループ評価では、CorrectionPlannerはWaymaxで衝突率を20%以上削減し、nuPlanで最先端の計画スコアを達成している。

論文の概要: CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

関連論文リスト