CLIP-based Point Cloud Classification via Point Cloud to Image Translation
- URL: http://arxiv.org/abs/2408.03545v1
- Date: Wed, 7 Aug 2024 04:50:05 GMT
- Title: CLIP-based Point Cloud Classification via Point Cloud to Image Translation
- Authors: Shuvozit Ghose, Manyi Li, Yiming Qian, Yang Wang,
- Abstract summary: Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model i.e. PointCLIP has added a new direction in the point cloud classification research domain.
We propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps.
- Score: 19.836264118079573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Point cloud understanding is an inherently challenging problem because of the sparse and unordered structure of the point cloud in the 3D space. Recently, Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model i.e. PointCLIP has added a new direction in the point cloud classification research domain. In this method, at first multi-view depth maps are extracted from the point cloud and passed through the CLIP visual encoder. To transfer the 3D knowledge to the network, a small network called an adapter is fine-tuned on top of the CLIP visual encoder. PointCLIP has two limitations. Firstly, the point cloud depth maps lack image information which is essential for tasks like classification and recognition. Secondly, the adapter only relies on the global representation of the multi-view features. Motivated by this observation, we propose a Pretrained Point Cloud to Image Translation Network (PPCITNet) that produces generalized colored images along with additional salient visual cues to the point cloud depth maps so that it can achieve promising performance on point cloud classification and understanding. In addition, we propose a novel viewpoint adapter that combines the view feature processed by each viewpoint as well as the global intertwined knowledge that exists across the multi-view features. The experimental results demonstrate the superior performance of the proposed model over existing state-of-the-art CLIP-based models on ModelNet10, ModelNet40, and ScanobjectNN datasets.
Related papers
- HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer.
Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z) - CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth
Pre-training [121.46758260964114]
Pre-training across 3D vision and language remains under development because of limited training data.
Recent works attempt to transfer vision-language pre-training models to 3D vision.
PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification.
We propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain.
arXiv Detail & Related papers (2022-10-03T16:13:14Z) - CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised
Point Cloud Learning [53.1436669083784]
We propose a generic Contour-Perturbed Reconstruction Network (CP-Net), which can effectively guide self-supervised reconstruction to learn semantic content in the point cloud.
For classification, we get a competitive result with the fully-supervised methods on ModelNet40 (92.5% accuracy) and ScanObjectNN (87.9% accuracy)
arXiv Detail & Related papers (2022-01-20T15:04:12Z) - PointCLIP: Point Cloud Understanding by CLIP [77.02399444893963]
We propose PointCLIP, which conducts alignment between CLIP-encoded point cloud and 3D category texts.
PointCLIP is a promising alternative for effective 3D point cloud understanding via CLIP under low resource cost and data regime.
arXiv Detail & Related papers (2021-12-04T19:42:40Z) - Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding [80.04281842702294]
We introduce the concept of the multi-view point cloud (Voint cloud) representing each 3D point as a set of features extracted from several view-points.
This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation.
We deploy a Voint neural network (VointNet) with a theoretically established functional form to learn representations in the Voint space.
arXiv Detail & Related papers (2021-11-30T13:08:19Z) - PnP-3D: A Plug-and-Play for 3D Point Clouds [38.05362492645094]
We propose a plug-and-play module, -3D, to improve the effectiveness of existing networks in analyzing point cloud data.
To thoroughly evaluate our approach, we conduct experiments on three standard point cloud analysis tasks.
In addition to achieving state-of-the-art results, we present comprehensive studies to demonstrate our approach's advantages.
arXiv Detail & Related papers (2021-08-16T23:59:43Z) - Multi-scale Receptive Fields Graph Attention Network for Point Cloud
Classification [35.88116404702807]
The proposed MRFGAT architecture is tested on ModelNet10 and ModelNet40 datasets.
Results show it achieves state-of-the-art performance in shape classification tasks.
arXiv Detail & Related papers (2020-09-28T13:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.