Multimodal Robust Prompt Distillation for 3D Point Cloud Models
- URL: http://arxiv.org/abs/2511.21574v1
- Date: Wed, 26 Nov 2025 16:49:38 GMT
- Title: Multimodal Robust Prompt Distillation for 3D Point Cloud Models
- Authors: Xiang Gu, Liming Lu, Xu Zheng, Anan Du, Yongbin Zhou, Shuchao Pang,
- Abstract summary: Adrial attacks pose a significant threat to learning-based 3D point cloud models.<n>We propose Multimodal Robust Prompt Distillation (MRPD) for distilling robust 3D point cloud model.<n>It learns lightweight prompts by aligning student point cloud model's features with robust embeddings from three distinct teachers.
- Score: 16.319048523015773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial attacks pose a significant threat to learning-based 3D point cloud models, critically undermining their reliability in security-sensitive applications. Existing defense methods often suffer from (1) high computational overhead and (2) poor generalization ability across diverse attack types. To bridge these gaps, we propose a novel yet efficient teacher-student framework, namely Multimodal Robust Prompt Distillation (MRPD) for distilling robust 3D point cloud model. It learns lightweight prompts by aligning student point cloud model's features with robust embeddings from three distinct teachers: a vision model processing depth projections, a high-performance 3D model, and a text encoder. To ensure a reliable knowledge transfer, this distillation is guided by a confidence-gated mechanism which dynamically balances the contribution of all input modalities. Notably, since the distillation is all during the training stage, there is no additional computational cost at inference. Extensive experiments demonstrate that MRPD substantially outperforms state-of-the-art defense methods against a wide range of white-box and black-box attacks, while even achieving better performance on clean data. Our work presents a new, practical paradigm for building robust 3D vision systems by efficiently harnessing multimodal knowledge.
Related papers
- MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models [123.90007730845876]
We propose MMT-ARD: a Multimodal Multi-Teacher Adversarial Distillation framework.<n>Our key innovation is a dual-teacher knowledge fusion architecture that collaboratively optimize clean feature preservation and robust feature enhancement.<n>Experiments on ImageNet and zero-shot benchmarks demonstrate that MMT-ARD improves robust accuracy by +4.32% and zero-shot accuracy by +3.5%.
arXiv Detail & Related papers (2025-11-21T17:46:44Z) - Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection [31.377138253827603]
Anomaly detection in 3D point clouds is crucial in a wide range of industrial applications.<n>We propose a novel Point-Language model with dual-prompts for 3D ANomaly dEtection (PLANE)
arXiv Detail & Related papers (2025-02-16T23:10:57Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies.
We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds.
We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z) - Transferable 3D Adversarial Shape Completion using Diffusion Models [8.323647730916635]
3D point cloud feature learning has significantly improved the performance of 3D deep-learning models.
Existing attack methods primarily focus on white-box scenarios and struggle to transfer to recently proposed 3D deep-learning models.
In this paper, we generate high-quality adversarial point clouds using diffusion models.
Our proposed attacks outperform state-of-the-art adversarial attack methods against both black-box models and defenses.
arXiv Detail & Related papers (2024-07-14T04:51:32Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds [16.604139389480615]
Eidos is a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS.
This paper adds to the understanding of adversarial attacks by presenting Eidos, a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS.
arXiv Detail & Related papers (2024-05-23T06:09:08Z) - PCLD: Point Cloud Layerwise Diffusion for Adversarial Purification [0.8192907805418583]
Point clouds are extensively employed in a variety of real-world applications such as robotics, autonomous driving and augmented reality.
A typical way to assess a model's robustness is through adversarial attacks.
We propose Point Cloud Layerwise Diffusion (PCLD), a layerwise diffusion based 3D point cloud defense strategy.
arXiv Detail & Related papers (2024-03-11T13:13:10Z) - DistiLLM: Towards Streamlined Distillation for Large Language Models [53.46759297929675]
DistiLLM is a more effective and efficient KD framework for auto-regressive language models.
DisiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs.
arXiv Detail & Related papers (2024-02-06T11:10:35Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware
Robust Adversarial Training [64.14759275211115]
We propose a depth-aware robust adversarial training method for monocular 3D object detection, dubbed DART3D.
Our adversarial training approach capitalizes on the inherent uncertainty, enabling the model to significantly improve its robustness against adversarial attacks.
arXiv Detail & Related papers (2023-09-03T07:05:32Z) - 3D Point Cloud Pre-training with Knowledge Distillation from 2D Images [128.40422211090078]
We propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model.
Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images.
In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models.
arXiv Detail & Related papers (2022-12-17T23:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.