TrojVLM: Backdoor Attack Against Vision Language Models
- URL: http://arxiv.org/abs/2409.19232v1
- Date: Sat, 28 Sep 2024 04:37:09 GMT
- Title: TrojVLM: Backdoor Attack Against Vision Language Models
- Authors: Weimin Lyu, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen,
- Abstract summary: This study introduces TrojVLM, the first exploration of backdoor attacks aimed at Vision Language Models (VLMs)
TrojVLM inserts predetermined target text into output text when encountering poisoned images.
A novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content.
- Score: 50.87239635292717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of Vision Language Models (VLMs) is a significant advancement in integrating computer vision with Large Language Models (LLMs) to produce detailed text descriptions based on visual inputs, yet it introduces new security vulnerabilities. Unlike prior work that centered on single modalities or classification tasks, this study introduces TrojVLM, the first exploration of backdoor attacks aimed at VLMs engaged in complex image-to-text generation. Specifically, TrojVLM inserts predetermined target text into output text when encountering poisoned images. Moreover, a novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content. Our evaluation on image captioning and visual question answering (VQA) tasks confirms the effectiveness of TrojVLM in maintaining original semantic content while triggering specific target text outputs. This study not only uncovers a critical security risk in VLMs and image-to-text generation but also sets a foundation for future research on securing multimodal models against such sophisticated threats.
Related papers
- Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion [17.18411620606476]
We introduce the Attribute-aware Prompt Attack (AP-Attack) to disrupt fine-grained semantic features of pedestrian images.
AP-Attack achieves state-of-the-art transferability, significantly outperforming previous methods by 22.9% on mean Drop Rate.
arXiv Detail & Related papers (2025-02-27T02:32:58Z) - Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models [26.681274483708165]
This paper employs a method named typographic attack to reveal that various image generation models are susceptible to threats within the vision modality.
We also evaluate the defense performance of various existing methods when facing threats in the vision modality and uncover their ineffectiveness.
arXiv Detail & Related papers (2024-12-07T04:55:39Z) - Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models [72.75669790569629]
Vision-language alignment in Large Vision-Language Models (LVLMs) successfully enables LLMs to understand visual input.
We find that existing vision-language alignment methods fail to transfer the existing safety mechanism for text in LLMs to vision.
We propose a novel Text-Guided vision-language alignment method (TGA) for LVLMs.
arXiv Detail & Related papers (2024-10-16T15:20:08Z) - Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models [15.029014337718849]
Large vision-language models (LVLMs) integrate visual information into large language models, showcasing remarkable multi-modal conversational capabilities.
In general, LVLMs rely on vision encoders to transform images into visual tokens, which are crucial for the language models to perceive image contents effectively.
We propose a non-targeted attack method referred to as VT-Attack, which constructs adversarial examples from multiple perspectives.
arXiv Detail & Related papers (2024-10-09T09:06:56Z) - Backdooring Vision-Language Models with Out-Of-Distribution Data [44.40928756056506]
Vision-Language Models (VLMs) generate detailed text descriptions from visual inputs.
Despite their growing importance, the security of VLMs, particularly against backdoor attacks, is under explored.
We introduce VLOOD (Backdooring Vision-Language Models with Out-of-Distribution Data), a novel approach with two key contributions.
arXiv Detail & Related papers (2024-10-02T06:21:00Z) - Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model [23.764618459753326]
The Typographic Attack has also been expected to be a security threat to LVLMs.
We verify typographic attacks on current well-known commercial and open-source LVLMs.
To better assess this vulnerability, we propose the most comprehensive and largest-scale Typographic dataset to date.
arXiv Detail & Related papers (2024-02-29T13:31:56Z) - VQAttack: Transferable Adversarial Attacks on Visual Question Answering
via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules.
Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z) - Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization [52.935150075484074]
We introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language.
The resulting visual tokens encompass high-level semantics worthy of a word and also support dynamic sequence length varying from the image.
This unification empowers LaVIT to serve as an impressive generalist interface to understand and generate multi-modal content simultaneously.
arXiv Detail & Related papers (2023-09-09T03:01:38Z) - UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding [88.24517460894634]
We propose a unified framework to take advantage of the fine-grained information for zero-shot vision-language learning.
Our framework outperforms former zero-shot methods on VQA and achieves substantial improvement on SNLI-VE and VCR.
arXiv Detail & Related papers (2023-07-03T09:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.