Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
- URL: http://arxiv.org/abs/2502.10674v1
- Date: Sat, 15 Feb 2025 04:58:21 GMT
- Title: Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
- Authors: Khanh Nguyen, Ghulam Mubashar Hassan, Ajmal Mian,
- Abstract summary: We develop a text-image-point cloud pretraining framework for 3D object recognition.
We also introduce DuoMamba, a two-stream linear state space model tailored for point clouds.
When pretrained with our framework, DuoMamba surpasses current state-of-the-art methods while reducing latency and FLOPs.
- Score: 27.70464285941205
- License:
- Abstract: Recent open-world representation learning approaches have leveraged CLIP to enable zero-shot 3D object recognition. However, performance on real point clouds with occlusions still falls short due to the unrealistic pretraining settings. Additionally, these methods incur high inference costs because they rely on Transformer's attention modules. In this paper, we make two contributions to address these limitations. First, we propose occlusion-aware text-image-point cloud pretraining to reduce the training-testing domain gap. From 52K synthetic 3D objects, our framework generates nearly 630K partial point clouds for pretraining, consistently improving real-world recognition performances of existing popular 3D networks. Second, to reduce computational requirements, we introduce DuoMamba, a two-stream linear state space model tailored for point clouds. By integrating two space-filling curves with 1D convolutions, DuoMamba effectively models spatial dependencies between point tokens, offering a powerful alternative to Transformer. When pretrained with our framework, DuoMamba surpasses current state-of-the-art methods while reducing latency and FLOPs, highlighting the potential of our approach for real-world applications. We will release our data and code to facilitate future research.
Related papers
- SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction.
Recent works have diverged into two directions: regression-based modeling and generative modeling.
We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z) - CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation [33.07886526437753]
3D point clouds captured from real-world sensors frequently encompass noisy points due to various obstacles.
These challenges hinder the deployment of pre-trained point cloud recognition models trained on clean point clouds.
We propose CloudFixer, a test-time input adaptation method tailored for 3D point clouds.
arXiv Detail & Related papers (2024-07-23T05:35:04Z) - PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training [90.06520673092702]
We present PointRegGPT, boosting 3D point cloud registration using generative point-cloud pairs for training.
To our knowledge, this is the first generative approach that explores realistic data generation for indoor point cloud registration.
arXiv Detail & Related papers (2024-07-19T06:29:57Z) - ComPC: Completing a 3D Point Cloud with 2D Diffusion Priors [52.72867922938023]
3D point clouds directly collected from objects through sensors are often incomplete due to self-occlusion.
We propose a test-time framework for completing partial point clouds across unseen categories without any requirement for training.
arXiv Detail & Related papers (2024-04-10T08:02:17Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - StarNet: Style-Aware 3D Point Cloud Generation [82.30389817015877]
StarNet is able to reconstruct and generate high-fidelity and even 3D point clouds using a mapping network.
Our framework achieves comparable state-of-the-art performance on various metrics in the point cloud reconstruction and generation tasks.
arXiv Detail & Related papers (2023-03-28T08:21:44Z) - EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer.
Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z) - DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic
Voxelization [0.0]
We propose a novel two-stage framework for the efficient 3D point cloud object detection.
We parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy.
We highlight our KITTI 3D object detection dataset with 75 FPS and on Open dataset with 25 FPS inference speed with satisfactory accuracy.
arXiv Detail & Related papers (2021-07-27T10:07:39Z) - Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR
Point Clouds [2.924868086534434]
This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud.
Our Multi-Projection Fusion framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models.
arXiv Detail & Related papers (2020-11-03T19:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.