See More and Know More: Zero-shot Point Cloud Segmentation via
Multi-modal Visual Data
- URL: http://arxiv.org/abs/2307.10782v1
- Date: Thu, 20 Jul 2023 11:32:51 GMT
- Title: See More and Know More: Zero-shot Point Cloud Segmentation via
Multi-modal Visual Data
- Authors: Yuhang Lu, Qi Jiang, Runnan Chen, Yuenan Hou, Xinge Zhu, Yuexin Ma
- Abstract summary: Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase.
We propose a novel multi-modal zero-shot learning method to better utilize the complementary information of point clouds and images for more accurate visual-semantic alignment.
- Score: 22.53879737713057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot point cloud segmentation aims to make deep models capable of
recognizing novel objects in point cloud that are unseen in the training phase.
Recent trends favor the pipeline which transfers knowledge from seen classes
with labels to unseen classes without labels. They typically align visual
features with semantic features obtained from word embedding by the supervision
of seen classes' annotations. However, point cloud contains limited information
to fully match with semantic features. In fact, the rich appearance information
of images is a natural complement to the textureless point cloud, which is not
well explored in previous literature. Motivated by this, we propose a novel
multi-modal zero-shot learning method to better utilize the complementary
information of point clouds and images for more accurate visual-semantic
alignment. Extensive experiments are performed in two popular benchmarks, i.e.,
SemanticKITTI and nuScenes, and our method outperforms current SOTA methods
with 52% and 49% improvement on average for unseen class mIoU, respectively.
Related papers
- PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds [18.840000859663153]
We propose PRED, a novel image-assisted pre-training framework for outdoor point clouds.
The main ingredient of our framework is a Birds-Eye-View (BEV) feature map conditioned semantic rendering.
We further enhance our model's performance by incorporating point-wise masking with a high mask ratio.
arXiv Detail & Related papers (2023-11-08T07:26:09Z) - Cross-Modal Information-Guided Network using Contrastive Learning for
Point Cloud Registration [17.420425069785946]
We present a novel Cross-Modal Information-Guided Network (CMIGNet) for point cloud registration.
We first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism.
We employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning.
arXiv Detail & Related papers (2023-11-02T12:56:47Z) - Point Contrastive Prediction with Semantic Clustering for
Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data.
Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z) - Few-Shot Point Cloud Semantic Segmentation via Contrastive
Self-Supervision and Multi-Resolution Attention [6.350163959194903]
We propose a contrastive self-supervision framework for few-shot learning pretrain.
Specifically, we implement a novel contrastive learning approach with a learnable augmentor for a 3D point cloud.
We develop a multi-resolution attention module using both the nearest and farthest points to extract the local and global point information more effectively.
arXiv Detail & Related papers (2023-02-21T07:59:31Z) - Let Images Give You More:Point Cloud Cross-Modal Training for Shape
Analysis [43.13887916301742]
This paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy to boost point cloud analysis.
To effectively acquire auxiliary knowledge from view images, we develop a teacher-student framework and formulate the cross modal learning as a knowledge distillation problem.
We verify significant gains on various datasets using appealing backbones, i.e., equipped with PointCMT, PointNet++ and PointMLP.
arXiv Detail & Related papers (2022-10-09T09:35:22Z) - Learning Scene Flow in 3D Point Clouds with Noisy Pseudo Labels [71.11151016581806]
We propose a novel scene flow method that captures 3D motions from point clouds without relying on ground-truth scene flow annotations.
Our method not only outperforms state-of-the-art self-supervised approaches, but also outperforms some supervised approaches that use accurate ground-truth flows.
arXiv Detail & Related papers (2022-03-23T18:20:03Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Unsupervised Point Cloud Representation Learning with Deep Neural
Networks: A Survey [104.71816962689296]
Unsupervised point cloud representation learning has attracted increasing attention due to the constraint in large-scale point cloud labelling.
This paper provides a comprehensive review of unsupervised point cloud representation learning using deep neural networks.
arXiv Detail & Related papers (2022-02-28T07:46:05Z) - Self-Supervised Feature Learning from Partial Point Clouds via Pose
Disentanglement [35.404285596482175]
We propose a novel self-supervised framework to learn informative representations from partial point clouds.
We leverage partial point clouds scanned by LiDAR that contain both content and pose attributes.
Our method not only outperforms existing self-supervised methods, but also shows a better generalizability across synthetic and real-world datasets.
arXiv Detail & Related papers (2022-01-09T14:12:50Z) - Unsupervised Representation Learning for 3D Point Cloud Data [66.92077180228634]
We propose a simple yet effective approach for unsupervised point cloud learning.
In particular, we identify a very useful transformation which generates a good contrastive version of an original point cloud.
We conduct experiments on three downstream tasks which are 3D object classification, shape part segmentation and scene segmentation.
arXiv Detail & Related papers (2021-10-13T10:52:45Z) - Few-shot 3D Point Cloud Semantic Segmentation [138.80825169240302]
We propose a novel attention-aware multi-prototype transductive few-shot point cloud semantic segmentation method.
Our proposed method shows significant and consistent improvements compared to baselines in different few-shot point cloud semantic segmentation settings.
arXiv Detail & Related papers (2020-06-22T08:05:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.