Related papers: Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

URL: http://arxiv.org/abs/2304.00788v2
Date: Wed, 17 May 2023 02:09:03 GMT
Title: Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
Authors: Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang
Abstract summary: The goal of open-vocabulary 3D point-cloud detection is to identify novel objects based on arbitrary textual descriptions. We develop a point-cloud detector that can learn a general representation for localizing various objects. We also propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text.
Score: 62.18197846270103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of open-vocabulary detection is to identify novel objects based on arbitrary textual descriptions. In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. Specifically, we resort to rich image pre-trained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pre-trained detectors. Moreover, we propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text, thereby enabling the point-cloud detector to benefit from vision-language pre-trained models,i.e.,CLIP. The novel use of image and vision-language pre-trained models for point-cloud detectors allows for open-vocabulary 3D object detection without the need for 3D annotations. Experiments demonstrate that the proposed method improves at least 3.03 points and 7.47 points over a wide range of baselines on the ScanNet and SUN RGB-D datasets, respectively. Furthermore, we provide a comprehensive analysis to explain why our approach works.

Related papers

Open Vocabulary Monocular 3D Object Detection [10.424711580213616]
We pioneer the study of open-vocabulary monocular 3D object detection, a novel task that aims to detect and localize objects in 3D space from a single RGB image. We introduce a class-agnostic approach that leverages open-vocabulary 2D detectors and lifts 2D bounding boxes into 3D space. Our approach decouples the recognition and localization of objects in 2D from the task of estimating 3D bounding boxes, enabling generalization across unseen categories.
arXiv Detail & Related papers (2024-11-25T18:59:17Z)
Objects as Spatio-Temporal 2.5D points [5.588892124219713]
We propose a weakly supervised method to estimate 3D position of objects by jointly learning to regress the 2D object detections scene's depth prediction in a single feed-forward pass of a network. Our proposed method extends a single-point based object detector, and introduces a novel object representation where each object is modeled as a BEV point-temporally, without the need of any 3D or BEV annotations for training and LiDAR data at query time.
arXiv Detail & Related papers (2022-12-06T05:14:30Z)
AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation. We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z)
ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection [114.54835359657707]
ProposalContrast is an unsupervised point cloud pre-training framework. It learns robust 3D representations by contrasting region proposals. ProposalContrast is verified on various 3D detectors.
arXiv Detail & Related papers (2022-07-26T04:45:49Z)
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning [62.18197846270103]
Current point-cloud detection methods have difficulty detecting the open-vocabulary objects in the real world. We propose OV-3DETIC, an Open-Vocabulary 3D DETector using Image-level Class supervision.
arXiv Detail & Related papers (2022-07-05T12:13:52Z)
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection [35.5386998382886]
3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description. Previous methods mostly follow a two-stage paradigm, i.e., language-irrelevant detection and cross-modal matching. We propose a 3D Single-Stage Referred Point Progressive Selection method, which progressively selects keypoints with the guidance of language and directly locates the target.
arXiv Detail & Related papers (2022-04-13T09:46:27Z)
SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA) Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling. In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z)
Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection. The whole architecture facilitates two-stage fusion. Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.