Fugu-MT 論文翻訳(概要): MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction

論文の概要: MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction

arxiv url: http://arxiv.org/abs/2508.15653v2
Date: Fri, 22 Aug 2025 01:44:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 12:20:05.415695
Title: MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction
Title（参考訳）: MapKD: 効率的なオンラインHDマップ構築のためのクロスモーダル蒸留による事前知識のロック解除
Authors: Ziyang Yan, Ruikai Li, Zhiyong Cui, Bohan Li, Han Jiang, Yilong Ren, Aoyong Li, Zhenning Li, Sijia Wen, Haiyang Yu,
Abstract要約: MapKDは、革新的なTeach-Coach-Student(TCS)パラダイムを備えた、多段階のクロスモーダルな知識蒸留フレームワークである。本稿では,鳥の目視機能アライメントのためのToken-Guided 2D Patch Distillation (TGPD) と,意味学習指導のためのMasked Semantic Response Distillation (MSRD) の2つを紹介する。挑戦的なnuScenesデータセットの実験では、MapKDは推論速度を同時に加速しながら、+6.68 mIoUと+10.94 mAPの学生モデルを改善することが示されている。
参考スコア（独自算出の注目度）: 23.156125781601528
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Online HD map construction is a fundamental task in autonomous driving systems, aiming to acquire semantic information of map elements around the ego vehicle based on real-time sensor inputs. Recently, several approaches have achieved promising results by incorporating offline priors such as SD maps and HD maps or by fusing multi-modal data. However, these methods depend on stale offline maps and multi-modal sensor suites, resulting in avoidable computational overhead at inference. To address these limitations, we employ a knowledge distillation strategy to transfer knowledge from multimodal models with prior knowledge to an efficient, low-cost, and vision-centric student model. Specifically, we propose MapKD, a novel multi-level cross-modal knowledge distillation framework with an innovative Teacher-Coach-Student (TCS) paradigm. This framework consists of: (1) a camera-LiDAR fusion model with SD/HD map priors serving as the teacher; (2) a vision-centric coach model with prior knowledge and simulated LiDAR to bridge the cross-modal knowledge transfer gap; and (3) a lightweight vision-based student model. Additionally, we introduce two targeted knowledge distillation strategies: Token-Guided 2D Patch Distillation (TGPD) for bird's eye view feature alignment and Masked Semantic Response Distillation (MSRD) for semantic learning guidance. Extensive experiments on the challenging nuScenes dataset demonstrate that MapKD improves the student model by +6.68 mIoU and +10.94 mAP while simultaneously accelerating inference speed. The code is available at:https://github.com/2004yan/MapKD2026.
Abstract（参考訳）: オンラインHDマップ構築は、リアルタイムセンサ入力に基づいて、エゴ車周辺の地図要素のセマンティック情報を取得することを目的とした、自律運転システムの基本課題である。近年,SDマップやHDマップなどのオフライン先行情報を組み込んだり,マルチモーダルデータを融合することで,有望な成果を上げている。しかし、これらの手法は、古いオフラインマップとマルチモーダルセンサースイートに依存しており、推論時の計算オーバーヘッドを回避できる。これらの制約に対処するため、我々は知識蒸留戦略を用いて、事前知識を持つマルチモーダルモデルから効率的で低コストで視覚中心の学生モデルに知識を伝達する。具体的には,新しいマルチレベルクロスモーダルな知識蒸留フレームワークであるMapKDを提案する。本フレームワークは,(1)SD/HDマップを教師として使用するカメラ-LiDAR融合モデル,(2)先行知識を持つビジョン中心のコーチモデル,(2)モダル間の知識伝達ギャップを埋めるためにLiDARをシミュレートし,(3)軽量なビジョンベース学生モデルから構成される。さらに,鳥の目視機能アライメントのためのToken-Guided 2D Patch Distillation (TGPD) と,意味学習指導のためのMasked Semantic Response Distillation (MSRD) の2つの目標とする知識蒸留戦略を紹介した。挑戦的なnuScenesデータセットに関する大規模な実験は、MapKDが推論速度を同時に加速しながら、+6.68 mIoUと+10.94 mAPの学生モデルを改善することを示した。コードはhttps://github.com/2004yan/MapKD2026で公開されている。

論文の概要: MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction

関連論文リスト