Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection
- URL: http://arxiv.org/abs/2502.11307v1
- Date: Sun, 16 Feb 2025 23:10:57 GMT
- Title: Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection
- Authors: Jiaxiang Wang, Haote Xu, Xiaolu Chen, Haodi Xu, Yue Huang, Xinghao Ding, Xiaotong Tu,
- Abstract summary: Anomaly detection in 3D point clouds is crucial in a wide range of industrial applications.<n>We propose a novel Point-Language model with dual-prompts for 3D ANomaly dEtection (PLANE)
- Score: 31.377138253827603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection (AD) in 3D point clouds is crucial in a wide range of industrial applications, especially in various forms of precision manufacturing. Considering the industrial demand for reliable 3D AD, several methods have been developed. However, most of these approaches typically require training separate models for each category, which is memory-intensive and lacks flexibility. In this paper, we propose a novel Point-Language model with dual-prompts for 3D ANomaly dEtection (PLANE). The approach leverages multi-modal prompts to extend the strong generalization capabilities of pre-trained Point-Language Models (PLMs) to the domain of 3D point cloud AD, achieving impressive detection performance across multiple categories using a single model. Specifically, we propose a dual-prompt learning method, incorporating both text and point cloud prompts. The method utilizes a dynamic prompt creator module (DPCM) to produce sample-specific dynamic prompts, which are then integrated with class-specific static prompts for each modality, effectively driving the PLMs. Additionally, based on the characteristics of point cloud data, we propose a pseudo 3D anomaly generation method (Ano3D) to improve the model's detection capabilities in an unsupervised setting. Experimental results demonstrate that the proposed method, which is under the multi-class-one-model paradigm, achieves a +8.7%/+17% gain on anomaly detection and localization performance as compared to the state-of-the-art one-class-one-model methods for the Anomaly-ShapeNet dataset, and obtains +4.3%/+4.1% gain for the Real3D-AD dataset. Code will be available upon publication.
Related papers
- C3D-AD: Toward Continual 3D Anomaly Detection via Kernel Attention with Learnable Advisor [9.917394249928092]
3D Anomaly Detection (AD) has shown great potential in detecting anomalies or defects of high-precision industrial products.<n>Existing methods are typically trained in a class-specific manner and also lack the capability of learning from emerging classes.<n>We propose Continual 3D Anomaly Detection (C3D-AD), which can not only learn generalized representations for multi-class point clouds but also handle new classes emerging over time.
arXiv Detail & Related papers (2025-08-02T10:54:55Z) - UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting [64.31900521467362]
No existing pre-training method is equally effective for both object- and scene-level point clouds.<n>We introduce UniPre3D, the first unified pre-training method that can be seamlessly applied to point clouds of any scale and 3D models of any architecture.
arXiv Detail & Related papers (2025-06-11T17:23:21Z) - TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment [14.535056813802527]
Testing-time Distribution Alignment (TeDA) is a novel framework that adapts a pretrained 2D vision-language model CLIP for unknown 3D object retrieval at test time.<n>TeDA projects 3D objects into multi-view images, extracts features using CLIP, and refines 3D query embeddings.<n>Experiments on four open-set 3D object retrieval benchmarks demonstrate TeDA greatly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-05-05T02:47:07Z) - CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds [1.9643285694999641]
We propose Contrastive Learning for 3D large multimodal models via Odds ratio on high-Resolution point clouds.
CL3DOR achieves state-of-the-art performance in 3D scene understanding and reasoning benchmarks.
arXiv Detail & Related papers (2025-01-07T15:42:32Z) - ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models [57.57832348655715]
We propose a novel zero-shot approach for keypoint detection on 3D shapes.<n>Our method utilizes the rich knowledge embedded within Multi-Modal Large Language Models.
arXiv Detail & Related papers (2024-12-09T08:31:57Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.<n>Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection [71.78795573911512]
We propose textbfOneDet3D, a universal one-for-all model that addresses 3D detection across different domains.
We propose the domain-aware in scatter and context, guided by a routing mechanism, to address the data interference issue.
The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities.
arXiv Detail & Related papers (2024-11-03T14:21:56Z) - Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation [19.2297264550686]
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods.
We introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities.
Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data.
arXiv Detail & Related papers (2024-08-16T07:52:00Z) - OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Number-Adaptive Prototype Learning for 3D Point Cloud Semantic
Segmentation [46.610620464184926]
We propose to use an adaptive number of prototypes to dynamically describe the different point patterns within a semantic class.
Our method achieves 2.3% mIoU improvement over the baseline model based on the point-wise classification paradigm.
arXiv Detail & Related papers (2022-10-18T15:57:20Z) - Multimodal Semi-Supervised Learning for 3D Objects [19.409295848915388]
This paper explores how the coherence of different modelities of 3D data can be used to improve data efficiency for both 3D classification and retrieval tasks.
We propose a novel multimodal semi-supervised learning framework by introducing instance-level consistency constraint and a novel multimodal contrastive prototype (M2CP) loss.
Our proposed framework significantly outperforms all the state-of-the-art counterparts for both classification and retrieval tasks by a large margin on the modelNet10 and ModelNet40 datasets.
arXiv Detail & Related papers (2021-10-22T05:33:16Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.