Fugu-MT 論文翻訳(概要): Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking

論文の概要: Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking

arxiv url: http://arxiv.org/abs/2604.25405v1
Date: Tue, 28 Apr 2026 09:16:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.791183
Title: Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking
Title（参考訳）: カメラによる3次元物体検出・追跡のための先行的トラバース点クラウドマップの活用
Authors: Markus Käppeler, Özgün Çiçek, Yakov Miron, Abhinav Valada,
Abstract要約: 我々はDualViewMapDetを提案する。DualViewMapDetはカメラのみの推論フレームワークで、オンラインでマップの先行情報を検索する。鍵となるアイデアは、片側ビュー変換を避けるデュアルスペースカメラマップ融合戦略である。コードと事前トレーニングされたモデルはhttps://dualviewmapdet.cs.uni-freiburg.deで公開しています。
参考スコア（独自算出の注目度）: 15.914966195454403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camera-based 3D object detection and tracking are central to autonomous driving, yet precise 3D object localization remains fundamentally constrained by depth ambiguity when no expensive, depth-rich online LiDAR is available at inference. In many deployments, however, vehicles repeatedly traverse the same environments, making static point cloud maps from prior traversals a practical source of geometric priors. We propose DualViewMapDet, a camera-only inference framework that retrieves such map priors online and leverages them to mitigate the absence of a LiDAR sensor during deployment. The key idea is a dual-space camera-map fusion strategy that avoids one-sided view conversion. Specifically, we (i) project the map into perspective view (PV) and encode multi-channel geometric cues to enrich image features and support BEV lifting, and (ii) encode the map directly in bird's-eye view (BEV) with a sparse voxel backbone and fuse it with lifted camera features in a shared metric space. Extensive evaluations on nuScenes and Argoverse 2 demonstrate consistent improvements over strong camera-only baselines, with particularly strong gains in object localization. Ablations further validate the contributions of PV/BEV fusion and prior-map coverage. We make the code and pre-trained models available at https://dualviewmapdet.cs.uni-freiburg.de .
Abstract（参考訳）: カメラによる3Dオブジェクトの検出と追跡は自動運転の中心であるが、正確な3Dオブジェクトのローカライゼーションは、高価で深度に富んだオンラインLiDARが推論で利用できない場合、基本的には深度あいまいさによって制約されている。しかし、多くの展開において、車両は同じ環境を何度も横切ることがあり、前方からの静的点雲マップは幾何学的先行の実用的な情報源となっている。我々は、カメラのみの推論フレームワークであるDualViewMapDetを提案し、そのようなマップの事前情報をオンラインで取得し、それらを活用して、デプロイ中にLiDARセンサーが存在しないことを緩和する。鍵となるアイデアは、片側ビュー変換を避けるデュアルスペースカメラマップ融合戦略である。具体的には (i)視点ビュー(PV)に地図を投影し、多チャンネル幾何学的手がかりを符号化して画像特徴を充実させ、BEVリフトをサポートし、 (i) 地図を直接鳥眼ビュー(BEV)にスパルス・ボクセルバックボーンでエンコードし、共有距離空間で持ち上げられたカメラ特徴と融合させる。 nuScenesとArgoverse 2の大規模な評価では、強力なカメラのみのベースラインよりも一貫した改善が見られ、特にオブジェクトのローカライゼーションが向上した。アブレーションはPV/BEV融合と事前マップのカバレッジの貢献をさらに検証する。コードと事前トレーニングされたモデルはhttps://dualviewmapdet.cs.uni-freiburg.deで公開しています。

論文の概要: Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking

関連論文リスト