Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
- URL: http://arxiv.org/abs/2502.01576v1
- Date: Mon, 03 Feb 2025 17:59:45 GMT
- Title: Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
- Authors: Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, Salman Khan,
- Abstract summary: Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations.
Existing methods seek to mitigate these risks by applying constrained adversarial fine-tuning to CLIP vision encoders on ImageNet-scale data.
We explore an alternative approach of leveraging existing vision classification models that have been adversarially pre-trained on large-scale data.
- Score: 26.656858396343726
- License:
- Abstract: Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations that can induce hallucinations, manipulate responses, or bypass safety mechanisms. Existing methods seek to mitigate these risks by applying constrained adversarial fine-tuning to CLIP vision encoders on ImageNet-scale data, ensuring their generalization ability is preserved. However, this limited adversarial training restricts robustness and broader generalization. In this work, we explore an alternative approach of leveraging existing vision classification models that have been adversarially pre-trained on large-scale data. Our analysis reveals two principal contributions: (1) the extensive scale and diversity of adversarial pre-training enables these models to demonstrate superior robustness against diverse adversarial threats, ranging from imperceptible perturbations to advanced jailbreaking attempts, without requiring additional adversarial training, and (2) end-to-end MLLM integration with these robust models facilitates enhanced adaptation of language components to robust visual features, outperforming existing plug-and-play methodologies on complex reasoning tasks. Through systematic evaluation across visual question-answering, image captioning, and jail-break attacks, we demonstrate that MLLMs trained with these robust models achieve superior adversarial robustness while maintaining favorable clean performance. Our framework achieves 2x and 1.5x average robustness gains in captioning and VQA tasks, respectively, and delivers over 10% improvement against jailbreak attacks. Code and pretrained models will be available at https://github.com/HashmatShadab/Robust-LLaVA.
Related papers
- Robustness of Large Language Models Against Adversarial Attacks [5.312946761836463]
We present a comprehensive study on the robustness of GPT LLM family.
We employ two distinct evaluation methods to assess their resilience.
Our experiments reveal significant variations in the robustness of these models, demonstrating their varying degrees of vulnerability to both character-level and semantic-level adversarial attacks.
arXiv Detail & Related papers (2024-12-22T13:21:15Z) - Robust image classification with multi-modal large language models [4.709926629434273]
adversarial examples can cause Deep Neural Networks to make incorrect predictions with high confidence.
To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance.
We propose a novel defense, Multi-Shield, designed to combine and complement these defenses with multi-modal information.
arXiv Detail & Related papers (2024-12-13T18:49:25Z) - Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation [15.883062174902093]
Large Vision-Language Models (VLMs) have demonstrated remarkable performance across multimodal tasks by integrating vision encoders with large language models (LLMs)
We introduce a novel UAP specifically designed for VLMs: the Doubly-Universal Adversarial Perturbation (Doubly-UAP)
arXiv Detail & Related papers (2024-12-11T05:23:34Z) - AI Safety in Practice: Enhancing Adversarial Robustness in Multimodal Image Captioning [0.0]
Multimodal machine learning models that combine visual and textual data are increasingly being deployed in critical applications.
This paper presents an effective strategy to enhance the robustness of multimodal image captioning models against adversarial attacks.
arXiv Detail & Related papers (2024-07-30T20:28:31Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning [67.0609518552321]
We propose to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models.
By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner.
arXiv Detail & Related papers (2023-12-05T07:29:14Z) - Set-level Guidance Attack: Boosting Adversarial Transferability of
Vision-Language Pre-training Models [52.530286579915284]
We present the first study to investigate the adversarial transferability of vision-language pre-training models.
The transferability degradation is partly caused by the under-utilization of cross-modal interactions.
We propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance.
arXiv Detail & Related papers (2023-07-26T09:19:21Z) - Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks.
We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision.
Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.