3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive
Selection
- URL: http://arxiv.org/abs/2204.06272v1
- Date: Wed, 13 Apr 2022 09:46:27 GMT
- Title: 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive
Selection
- Authors: Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen,
Huaxia Xia, Si Liu
- Abstract summary: 3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description.
Previous methods mostly follow a two-stage paradigm, i.e., language-irrelevant detection and cross-modal matching.
We propose a 3D Single-Stage Referred Point Progressive Selection method, which progressively selects keypoints with the guidance of language and directly locates the target.
- Score: 35.5386998382886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D visual grounding aims to locate the referred target object in 3D point
cloud scenes according to a free-form language description. Previous methods
mostly follow a two-stage paradigm, i.e., language-irrelevant detection and
cross-modal matching, which is limited by the isolated architecture. In such a
paradigm, the detector needs to sample keypoints from raw point clouds due to
the inherent properties of 3D point clouds (irregular and large-scale), to
generate the corresponding object proposal for each keypoint. However, sparse
proposals may leave out the target in detection, while dense proposals may
confuse the matching model. Moreover, the language-irrelevant detection stage
can only sample a small proportion of keypoints on the target, deteriorating
the target prediction. In this paper, we propose a 3D Single-Stage Referred
Point Progressive Selection (3D-SPS) method, which progressively selects
keypoints with the guidance of language and directly locates the target.
Specifically, we propose a Description-aware Keypoint Sampling (DKS) module to
coarsely focus on the points of language-relevant objects, which are
significant clues for grounding. Besides, we devise a Target-oriented
Progressive Mining (TPM) module to finely concentrate on the points of the
target, which is enabled by progressive intra-modal relation modeling and
inter-modal target mining. 3D-SPS bridges the gap between detection and
matching in the 3D visual grounding task, localizing the target at a single
stage. Experiments demonstrate that 3D-SPS achieves state-of-the-art
performance on both ScanRefer and Nr3D/Sr3D datasets.
Related papers
- Open-Vocabulary Point-Cloud Object Detection without 3D Annotation [62.18197846270103]
The goal of open-vocabulary 3D point-cloud detection is to identify novel objects based on arbitrary textual descriptions.
We develop a point-cloud detector that can learn a general representation for localizing various objects.
We also propose a novel de-biased triplet cross-modal contrastive learning to connect the modalities of image, point-cloud and text.
arXiv Detail & Related papers (2023-04-03T08:22:02Z) - PSA-Det3D: Pillar Set Abstraction for 3D object Detection [14.788139868324155]
We propose a pillar set abstraction (PSA) and foreground point compensation (FPC) to improve the detection performance for small object.
The experiments on the KITTI 3D detection benchmark show that our proposed PSA-Det3D outperforms other algorithms with high accuracy for small object detection.
arXiv Detail & Related papers (2022-10-20T03:05:34Z) - ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object
Detection [114.54835359657707]
ProposalContrast is an unsupervised point cloud pre-training framework.
It learns robust 3D representations by contrasting region proposals.
ProposalContrast is verified on various 3D detectors.
arXiv Detail & Related papers (2022-07-26T04:45:49Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Group-Free 3D Object Detection via Transformers [26.040378025818416]
We present a simple yet effective method for directly detecting 3D objects from the 3D point cloud.
Our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers citevaswaniattention.
With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.
arXiv Detail & Related papers (2021-04-01T17:59:36Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.