SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation
- URL: http://arxiv.org/abs/2510.26830v1
- Date: Wed, 29 Oct 2025 14:56:27 GMT
- Title: SmoothGuard: Defending Multimodal Large Language Models with Noise Perturbation and Clustering Aggregation
- Authors: Guangzhi Su, Shuchang Huang, Yutong Ke, Zhuohang Liu, Long Qian, Kaizhu Huang,
- Abstract summary: Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs.<n>Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment.<n>We introduce SmoothGuard, a lightweight and model-agnostic defense framework that enhances the robustness of MLLMs through randomized noise injection and clustering-based prediction aggregation.
- Score: 23.12897429892901
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multimodal large language models (MLLMs) have achieved impressive performance across diverse tasks by jointly reasoning over textual and visual inputs. Despite their success, these models remain highly vulnerable to adversarial manipulations, raising concerns about their safety and reliability in deployment. In this work, we first generalize an approach for generating adversarial images within the HuggingFace ecosystem and then introduce SmoothGuard, a lightweight and model-agnostic defense framework that enhances the robustness of MLLMs through randomized noise injection and clustering-based prediction aggregation. Our method perturbs continuous modalities (e.g., images and audio) with Gaussian noise, generates multiple candidate outputs, and applies embedding-based clustering to filter out adversarially influenced predictions. The final answer is selected from the majority cluster, ensuring stable responses even under malicious perturbations. Extensive experiments on POPE, LLaVA-Bench (In-the-Wild), and MM-SafetyBench demonstrate that SmoothGuard improves resilience to adversarial attacks while maintaining competitive utility. Ablation studies further identify an optimal noise range (0.1-0.2) that balances robustness and utility.
Related papers
- Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models [67.45032003041399]
We propose a novel Multi-Paradigm Collaborative Attack (MPCAttack) framework to boost the transferability of adversarial examples against MLLMs.<n>MPCO adaptively balances the importance of different paradigm representations and guides the global optimisation.<n>Our solution consistently outperforms state-of-the-art methods in both targeted and untargeted attacks on open-source and closed-source MLLMs.
arXiv Detail & Related papers (2026-03-05T06:01:26Z) - ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs [61.09812971042288]
Evolutionary Noise Jailbreak (ENJ)<n>This paper proposes a genetic algorithm to transform environmental noise from a passive interference into an actively optimizable attack carrier for jailbreaking LSMs.<n>Experiments on multiple mainstream speech models show that ENJ's attack effectiveness is significantly superior to existing baseline methods.
arXiv Detail & Related papers (2025-09-14T06:39:38Z) - Sampling-aware Adversarial Attacks Against Large Language Models [52.30089653615172]
Existing adversarial attacks typically target harmful responses in single-point greedy generations.<n>We show that for the goal of eliciting harmful responses, repeated sampling of model outputs during the attack prompt optimization.<n>We show that integrating sampling into existing attacks boosts success rates by up to 37% and improves efficiency by up to two orders of magnitude.
arXiv Detail & Related papers (2025-07-06T16:13:33Z) - Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks [10.44351773183656]
Vision-Language Models (VLMs) are vulnerable to jailbreak attacks when processing noisy or corrupted images.<n>To address this challenge, we propose Robust-VLGuard, a multimodal safety dataset aligned with misaligned image-text pairs.<n>For stronger optimization-based visual perturbation attacks, we propose DiffPure-VLM, leveraging diffusion models to convert adversarial perturbations into Gaussian-like noise.
arXiv Detail & Related papers (2025-04-02T02:35:19Z) - Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models [26.656858396343726]
Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations.<n>Existing methods seek to mitigate these risks by applying constrained adversarial fine-tuning to CLIP vision encoders on ImageNet-scale data.<n>We explore an alternative approach of leveraging existing vision classification models that have been adversarially pre-trained on large-scale data.
arXiv Detail & Related papers (2025-02-03T17:59:45Z) - Smoothed Embeddings for Robust Language Models [11.97873981355746]
Large language models (LLMs) are vulnerable to jailbreaking attacks that subvert alignment and induce harmful outputs.<n>We propose the Randomized Embedding Smoothing and Token Aggregation (RESTA) defense, which adds random noise to the embedding vectors and performs aggregation during the generation of each output token.<n>Our experiments demonstrate that our approach achieves superior robustness versus utility tradeoffs compared to the baseline defenses.
arXiv Detail & Related papers (2025-01-27T20:57:26Z) - Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks.
This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z) - Learning to Generate Noise for Multi-Attack Robustness [126.23656251512762]
Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations.
In safety-critical applications, this makes these methods extraneous as the attacker can adopt diverse adversaries to deceive the system.
We propose a novel meta-learning framework that explicitly learns to generate noise to improve the model's robustness against multiple types of attacks.
arXiv Detail & Related papers (2020-06-22T10:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.