Fugu-MT 論文翻訳(概要): HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

論文の概要: HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

arxiv url: http://arxiv.org/abs/2505.15793v1
Date: Wed, 21 May 2025 17:47:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-22 15:42:59.812562
Title: HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Title（参考訳）: HCRMP: 自動運転のためのLLM型コンテキスト強化学習フレームワーク
Authors: Zhiwen Chen, Bo Leng, Zhuoren Li, Hanming Deng, Guizhe Jin, Ran Yu, Huanxi Wen,
Abstract要約: 強化学習(RL)を備えた大規模言語モデル(LLM)は、複雑なシナリオにおける自律運転(AD)のパフォーマンスを向上させることができる。しかし、現在のRL法はLLM出力に過剰に依存しており、幻覚を起こす傾向にある。本稿では,幻覚の問題を解決するためのLLM-Hinted RLパラダイムを提案する。
参考スコア（独自算出の注目度）: 4.340881027724334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Integrating Large Language Models (LLMs) with Reinforcement Learning (RL) can enhance autonomous driving (AD) performance in complex scenarios. However, current LLM-Dominated RL methods over-rely on LLM outputs, which are prone to hallucinations.Evaluations show that state-of-the-art LLM indicates a non-hallucination rate of only approximately 57.95% when assessed on essential driving-related tasks. Thus, in these methods, hallucinations from the LLM can directly jeopardize the performance of driving policies. This paper argues that maintaining relative independence between the LLM and the RL is vital for solving the hallucinations problem. Consequently, this paper is devoted to propose a novel LLM-Hinted RL paradigm. The LLM is used to generate semantic hints for state augmentation and policy optimization to assist RL agent in motion planning, while the RL agent counteracts potential erroneous semantic indications through policy learning to achieve excellent driving performance. Based on this paradigm, we propose the HCRMP (LLM-Hinted Contextual Reinforcement Learning Motion Planner) architecture, which is designed that includes Augmented Semantic Representation Module to extend state space. Contextual Stability Anchor Module enhances the reliability of multi-critic weight hints by utilizing information from the knowledge base. Semantic Cache Module is employed to seamlessly integrate LLM low-frequency guidance with RL high-frequency control. Extensive experiments in CARLA validate HCRMP's strong overall driving performance. HCRMP achieves a task success rate of up to 80.3% under diverse driving conditions with different traffic densities. Under safety-critical driving conditions, HCRMP significantly reduces the collision rate by 11.4%, which effectively improves the driving performance in complex scenarios.
Abstract（参考訳）: 大規模言語モデル(LLM)と強化学習(RL)を統合することで、複雑なシナリオにおける自律運転(AD)のパフォーマンスを向上させることができる。しかし,現在のLLM-Dominated RL法は幻覚の傾向が強いLCM出力に過度に依存しており,本質的な運転関連タスクで評価された場合,その非幻覚率は約57.95%であることを示す評価結果が得られた。このようにして、LLMからの幻覚は、駆動ポリシーの性能を直接的に損なうことができる。本稿では,LLMとRLの相対的な独立性を維持することが幻覚問題の解決に不可欠であると主張している。そこで本研究では,LLM-Hinted RLパラダイムを提案する。 LLMは、状態拡張とポリシー最適化のためのセマンティックヒントを生成するために使用され、RLエージェントはポリシー学習を通じて潜在的誤意味指示に対処し、優れた駆動性能を達成する。このパラダイムに基づいて、状態空間を拡張するための拡張セマンティック表現モジュールを含むHCRMPアーキテクチャ(LLM-Hinted Contextual Reinforcement Learning Motion Planner)を提案する。コンテキスト安定アンカーモジュールは、知識ベースからの情報を活用することにより、多点重みヒントの信頼性を高める。セマンティックキャッシュモジュールはLLM低周波誘導とRL高周波制御をシームレスに統合するために使用される。 CARLAにおける大規模な実験は、HCRMPの全体的な運転性能の強さを検証している。 HCRMPは、交通密度の異なる様々な運転条件下で、最大80.3%のタスク成功率を達成する。安全クリティカルな運転条件下では、HCRMPは衝突速度を11.4%減少させ、複雑なシナリオにおける運転性能を効果的に向上させる。

論文の概要: HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

関連論文リスト