Fugu-MT 論文翻訳(概要): Open-Vocabulary BEV Segmentation with 3D-Aware Geometric Constraints

論文の概要: Open-Vocabulary BEV Segmentation with 3D-Aware Geometric Constraints

arxiv url: http://arxiv.org/abs/2606.24353v1
Date: Tue, 23 Jun 2026 09:43:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.86472
Title: Open-Vocabulary BEV Segmentation with 3D-Aware Geometric Constraints
Title（参考訳）: 幾何学的制約を考慮したオープンボキャブラリBEVセグメンテーション
Authors: Hojun Choi, Seulbin Hwang, Dae Jung Kim, Kisung Kim, Hyunjung Shim, Jinhan Lee,
Abstract要約: オープンボキャブラリBEVセグメンテーション(OVBS)を導入し、トレーニングセットを超えてカテゴリを認識する。 OVBEVSegはジオメトリを意識したOVBSフレームワークであり、効率的なガウススプラッティング(GS)ベースのアンプロジェクションを強化する。 nuScenesデータセットでは、OVBEVSegは最先端のパフォーマンスを達成し、未確認のカテゴリで15.3 mIoUのクローズドセットメソッドよりも優れています。
参考スコア（独自算出の注目度）: 20.26519299938903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bird's-eye view (BEV) perception fuses multi-camera images into a unified top-down representation for autonomous driving. Despite recent progress, state-of-the-art methods remain confined to closed-set scenarios, making them vulnerable to unpredictable real-world environments. In this work, we introduce open-vocabulary BEV segmentation (OVBS), which leverages vision-language models (VLMs) to recognize categories beyond the training set while maintaining precise BEV perception and real-time efficiency. A key challenge in OVBS lies in the 3D geometric inconsistency inherent in the ill-posed lifting of 2D VLM semantics into BEV. To address this, we propose OVBEVSeg, a geometry-aware OVBS framework that enhances efficient Gaussian splatting (GS)-based unprojection by leveraging robust 3D geometric constraints across three progressive stages: (1) 2D-to-BEV pseudo-labeling via reliable 3D projection for OV generalization; (2) joint 2D-BEV per-scene optimization with BEV structural constraints for 3D geometric consistency; and (3) 3D geometric distillation for online efficiency. On the nuScenes dataset, OVBEVSeg achieves state-of-the-art performance, outperforming closed-set methods by 15.3 mIoU on unseen categories. Remarkably, even with no novel-class ground-truth labels, it remains competitive with self- and semi-supervised baselines trained with up to 40% of ground-truth annotations. Furthermore, it achieves 2.5x faster inference with only 0.22x the memory consumption of projection-based methods. Project page: https://hchoi256.github.io/projects/ovbevseg/.
Abstract（参考訳）: 鳥眼ビュー(Bird's-eye View, BEV)は、複数のカメライメージを統合されたトップダウン表示に融合して自律運転を行う。最近の進歩にもかかわらず、最先端の手法はクローズドセットのシナリオに限られており、予測不可能な現実世界環境に対して脆弱である。本研究では,オープンボキャブラリBEVセグメンテーション(OVBS)を導入し,視覚言語モデル(VLM)を利用して,正確なBEV知覚とリアルタイム効率を維持しつつ,トレーニングセットを超えたカテゴリを認識する。 OVBSの重要な課題は、2次元のVLMセマンティクスをBEVへ持ち上げるのに固有の幾何学的不整合にある。 OVBEVSegは,OV一般化のための信頼性の高い3次元投影による2D-to-BEV擬似ラベル化,(2)3次元幾何整合性のためのBEV構造制約を用いた共同2D-BEV/シーン最適化,(3)オンラインの3次元幾何学的蒸留の3段階にわたる堅牢な3次元幾何学的制約を活用することで,効率的なガウススプラッティング(GS)に基づく非投影性を高める。 nuScenesデータセットでは、OVBEVSegは最先端のパフォーマンスを達成し、未確認のカテゴリで15.3 mIoUのクローズドセットメソッドよりも優れています。特筆すべきは、新規なベーストラルトラベルがなくても、最大40%のグラウンドトラルトアノテーションでトレーニングされた自己および半教師付きベースラインと競合することです。さらに、プロジェクションベースのメソッドのメモリ消費のわずか0.22倍で2.5倍高速な推論を実現する。プロジェクトページ: https://hchoi256.github.io/projects/ovbevseg/

論文の概要: Open-Vocabulary BEV Segmentation with 3D-Aware Geometric Constraints

関連論文リスト