Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
- URL: http://arxiv.org/abs/2312.02244v3
- Date: Mon, 15 Apr 2024 10:06:19 GMT
- Title: Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
- Authors: Guofeng Mei, Luigi Riz, Yiming Wang, Fabio Poiesi,
- Abstract summary: Zero-shot 3D point cloud understanding can be achieved via 2D Vision-Language Models (VLMs)
Existing strategies directly map Vision-Language Models from 2D pixels of rendered or captured views to 3D points, overlooking the inherent and expressible point cloud geometric structure.
We introduce the first training-free aggregation technique that leverages the point cloud's 3D geometric structure to improve the quality of the transferred Vision-Language Models.
- Score: 11.416392706435415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot 3D point cloud understanding can be achieved via 2D Vision-Language Models (VLMs). Existing strategies directly map Vision-Language Models from 2D pixels of rendered or captured views to 3D points, overlooking the inherent and expressible point cloud geometric structure. Geometrically similar or close regions can be exploited for bolstering point cloud understanding as they are likely to share semantic information. To this end, we introduce the first training-free aggregation technique that leverages the point cloud's 3D geometric structure to improve the quality of the transferred Vision-Language Models. Our approach operates iteratively, performing local-to-global aggregation based on geometric and semantic point-level reasoning. We benchmark our approach on three downstream tasks, including classification, part segmentation, and semantic segmentation, with a variety of datasets representing both synthetic/real-world, and indoor/outdoor scenarios. Our approach achieves new state-of-the-art results in all benchmarks. Our approach operates iteratively, performing local-to-global aggregation based on geometric and semantic point-level reasoning. Code and dataset are available at https://luigiriz.github.io/geoze-website/
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - U3DS$^3$: Unsupervised 3D Semantic Scene Segmentation [19.706172244951116]
This paper presents U3DS$3$, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes.
The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene.
We then undergo a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids.
arXiv Detail & Related papers (2023-11-10T12:05:35Z) - Clustering based Point Cloud Representation Learning for 3D Analysis [80.88995099442374]
We propose a clustering based supervised learning scheme for point cloud analysis.
Unlike current de-facto, scene-wise training paradigm, our algorithm conducts within-class clustering on the point embedding space.
Our algorithm shows notable improvements on famous point cloud segmentation datasets.
arXiv Detail & Related papers (2023-07-27T03:42:12Z) - Dynamic Clustering Transformer Network for Point Cloud Segmentation [23.149220817575195]
We propose a novel 3D point cloud representation network, called Dynamic Clustering Transformer Network (DCTNet)
It has an encoder-decoder architecture, allowing for both local and global feature learning.
Our method was evaluated on an object-based dataset (ShapeNet), an urban navigation dataset (Toronto-3D), and a multispectral LiDAR dataset.
arXiv Detail & Related papers (2023-05-30T01:11:05Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Exploiting Local Geometry for Feature and Graph Construction for Better
3D Point Cloud Processing with Graph Neural Networks [22.936590869919865]
We propose improvements in point representations and local neighborhood graph construction within the general framework of graph neural networks.
We show that the proposed network achieves faster training convergence, i.e. 40% less epochs for classification.
arXiv Detail & Related papers (2021-03-28T21:34:59Z) - Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.