Complementary Pseudo Multimodal Feature for Point Cloud Anomaly
Detection
- URL: http://arxiv.org/abs/2303.13194v1
- Date: Thu, 23 Mar 2023 11:52:17 GMT
- Title: Complementary Pseudo Multimodal Feature for Point Cloud Anomaly
Detection
- Authors: Yunkang Cao, Xiaohao Xu, Weiming Shen
- Abstract summary: Point cloud (PCD) anomaly detection steadily emerges as a promising research area.
This study proposes Complementary Pseudo Multimodal Feature (CPMF) that incorporates local geometrical information in 3D modality.
Experiments demonstrate the complementary capacity between 2D and 3D modality features, with 95.15% image-level AU-ROC and 92.93% pixel-level PRO.
- Score: 5.059800023492045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point cloud (PCD) anomaly detection steadily emerges as a promising research
area. This study aims to improve PCD anomaly detection performance by combining
handcrafted PCD descriptions with powerful pre-trained 2D neural networks. To
this end, this study proposes Complementary Pseudo Multimodal Feature (CPMF)
that incorporates local geometrical information in 3D modality using
handcrafted PCD descriptors and global semantic information in the generated
pseudo 2D modality using pre-trained 2D neural networks. For global semantics
extraction, CPMF projects the origin PCD into a pseudo 2D modality containing
multi-view images. These images are delivered to pre-trained 2D neural networks
for informative 2D modality feature extraction. The 3D and 2D modality features
are aggregated to obtain the CPMF for PCD anomaly detection. Extensive
experiments demonstrate the complementary capacity between 2D and 3D modality
features and the effectiveness of CPMF, with 95.15% image-level AU-ROC and
92.93% pixel-level PRO on the MVTec3D benchmark. Code is available on
https://github.com/caoyunkang/CPMF.
Related papers
- OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - Decomposing 3D Neuroimaging into 2+1D Processing for Schizophrenia
Recognition [25.80846093248797]
We propose to process the 3D data by a 2+1D framework so that we can exploit the powerful deep 2D Convolutional Neural Network (CNN) networks pre-trained on the huge ImageNet dataset for 3D neuroimaging recognition.
Specifically, 3D volumes of Magnetic Resonance Imaging (MRI) metrics are decomposed to 2D slices according to neighboring voxel positions.
Global pooling is applied to remove redundant information as the activation patterns are sparsely distributed over feature maps.
Channel-wise and slice-wise convolutions are proposed to aggregate the contextual information in the third dimension unprocessed by the 2D CNN model.
arXiv Detail & Related papers (2022-11-21T15:22:59Z) - Dual Multi-scale Mean Teacher Network for Semi-supervised Infection
Segmentation in Chest CT Volume for COVID-19 [76.51091445670596]
Automated detecting lung infections from computed tomography (CT) data plays an important role for combating COVID-19.
Most current COVID-19 infection segmentation methods mainly relied on 2D CT images, which lack 3D sequential constraint.
Existing 3D CT segmentation methods focus on single-scale representations, which do not achieve the multiple level receptive field sizes on 3D volume.
arXiv Detail & Related papers (2022-11-10T13:11:21Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Simultaneous Alignment and Surface Regression Using Hybrid 2D-3D
Networks for 3D Coherent Layer Segmentation of Retina OCT Images [33.99874168018807]
In this study, a novel framework based on hybrid 2D-3D convolutional neural networks (CNNs) is proposed to obtain continuous 3D retinal layer surfaces from OCT.
Our framework achieves superior results to state-of-the-art 2D methods in terms of both layer segmentation accuracy and cross-B-scan 3D continuity.
arXiv Detail & Related papers (2022-03-04T15:55:09Z) - TSGCNet: Discriminative Geometric Feature Learning with Two-Stream
GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes.
We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - Planar 3D Transfer Learning for End to End Unimodal MRI Unbalanced Data
Segmentation [0.0]
We present a novel approach of 2D to 3D transfer learning based on mapping pre-trained 2D convolutional neural network weights into planar 3D kernels.
The method is validated by the proposed planar 3D res-u-net network with encoder transferred from the 2D VGG-16.
arXiv Detail & Related papers (2020-11-23T17:11:50Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - Self-supervised Feature Learning by Cross-modality and Cross-view
Correspondences [32.01548991331616]
This paper presents a novel self-supervised learning approach to learn both 2D image features and 3D point cloud features.
It exploits cross-modality and cross-view correspondences without using any annotated human labels.
The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks.
arXiv Detail & Related papers (2020-04-13T02:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.