Fugu-MT 論文翻訳(概要): Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning

論文の概要: Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning

arxiv url: http://arxiv.org/abs/2207.01987v1
Date: Tue, 5 Jul 2022 12:13:52 GMT
ステータス: 翻訳完了
システム内更新日: 2022-07-06 14:54:33.558575
Title: Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning
Title（参考訳）: 画像レベルクラスとdebiased cross-modal contrastive learningによる開語彙3次元検出
Authors: Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang
Abstract要約: 現在の点雲検出法では,実世界の開語彙を検出するのが困難である。画像レベルのクラス管理を用いたオープン語彙3DDETectorであるOV-3DETICを提案する。
参考スコア（独自算出の注目度）: 62.18197846270103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current point-cloud detection methods have difficulty detecting the open-vocabulary objects in the real world, due to their limited generalization capability. Moreover, it is extremely laborious and expensive to collect and fully annotate a point-cloud detection dataset with numerous classes of objects, leading to the limited classes of existing point-cloud datasets and hindering the model to learn general representations to achieve open-vocabulary point-cloud detection. As far as we know, we are the first to study the problem of open-vocabulary 3D point-cloud detection. Instead of seeking a point-cloud dataset with full labels, we resort to ImageNet1K to broaden the vocabulary of the point-cloud detector. We propose OV-3DETIC, an Open-Vocabulary 3D DETector using Image-level Class supervision. Specifically, we take advantage of two modalities, the image modality for recognition and the point-cloud modality for localization, to generate pseudo labels for unseen classes. Then we propose a novel debiased cross-modal contrastive learning method to transfer the knowledge from image modality to point-cloud modality during training. Without hurting the latency during inference, OV-3DETIC makes the point-cloud detector capable of achieving open-vocabulary detection. Extensive experiments demonstrate that the proposed OV-3DETIC achieves at least 10.77 % mAP improvement (absolute value) and 9.56 % mAP improvement (absolute value) by a wide range of baselines on the SUN-RGBD dataset and ScanNet dataset, respectively. Besides, we conduct sufficient experiments to shed light on why the proposed OV-3DETIC works.
Abstract（参考訳）: 現在の点雲検出法は,その限定的な一般化能力のため,実世界の開語彙を検出するのが困難である。さらに、多数のオブジェクトのクラスでポイントクラウド検出データセットを収集し、完全に注釈付けすることは極めて困難であり、既存のポイントクラウドデータセットの限られたクラスにつながり、オープン語彙のポイントクラウド検出を実現するために一般的な表現を学ぶのを妨げる。私たちが知る限り、我々はオープンな3Dポイントクラウド検出の問題を初めて研究している。完全なラベル付きポイントクラウドデータセットを探す代わりに、ImageNet1Kを使用してポイントクラウド検出器の語彙を広げます。画像レベルのクラス管理を用いたオープン語彙3DDETectorであるOV-3DETICを提案する。具体的には、認識のためのイメージモダリティとローカライゼーションのためのポイントクラウドモダリティという2つのモダリティを利用して、見当たらないクラスのための擬似ラベルを生成する。そこで本研究では,画像のモダリティからポイントクラウドのモダリティへ知識を伝達する,新しい非バイアス型クロスモーダルコントラスト学習手法を提案する。推論中のレイテンシを損なうことなく、OV-3DETICは開語彙検出が可能なポイントクラウド検出器を提供する。 OV-3DETICは、SUN-RGBDデータセットとScanNetデータセットの幅広いベースラインによって、少なくとも10.77 % mAP改善(絶対値)と9.56 % mAP改善(絶対値)を達成することを示した。さらに,提案するov-3detic 作用の解明に十分な実験を行った。

論文の概要: Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning

関連論文リスト