Related papers: Defense-Prefix for Preventing Typographic Attacks on CLIP

Defense-Prefix for Preventing Typographic Attacks on CLIP

URL: http://arxiv.org/abs/2304.04512v3
Date: Wed, 6 Sep 2023 16:05:50 GMT
Title: Defense-Prefix for Preventing Typographic Attacks on CLIP
Authors: Hiroki Azuma, Yusuke Matsui
Abstract summary: Some adversarial attacks fool a model into false or absurd classifications. We introduce our simple yet effective method: Defense-Prefix (DP), which inserts the DP token before a class name to make words "robust" against typographic attacks. Our method significantly improves the accuracy of classification tasks for typographic attack datasets, while maintaining the zero-shot capabilities of the model.
Score: 14.832208701208414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language pre-training models (VLPs) have exhibited revolutionary improvements in various vision-language tasks. In VLP, some adversarial attacks fool a model into false or absurd classifications. Previous studies addressed these attacks by fine-tuning the model or changing its architecture. However, these methods risk losing the original model's performance and are difficult to apply to downstream tasks. In particular, their applicability to other tasks has not been considered. In this study, we addressed the reduction of the impact of typographic attacks on CLIP without changing the model parameters. To achieve this, we expand the idea of "prefix learning" and introduce our simple yet effective method: Defense-Prefix (DP), which inserts the DP token before a class name to make words "robust" against typographic attacks. Our method can be easily applied to downstream tasks, such as object detection, because the proposed method is independent of the model parameters. Our method significantly improves the accuracy of classification tasks for typographic attack datasets, while maintaining the zero-shot capabilities of the model. In addition, we leverage our proposed method for object detection, demonstrating its high applicability and effectiveness. The codes and datasets are available at https://github.com/azuma164/Defense-Prefix.

Related papers

Variance-Based Defense Against Blended Backdoor Attacks [0.0]
Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models.<n>We propose a novel defense method that trains a model on the given dataset, detects poisoned classes, and extracts the critical part of the attack trigger.
arXiv Detail & Related papers (2025-06-02T09:01:35Z)
No Query, No Access [50.18709429731724]
We introduce the textbfVictim Data-based Adrial Attack (VDBA), which operates using only victim texts.<n>To prevent access to the victim model, we create a shadow dataset with publicly available pre-trained models and clustering methods.<n>Experiments on the Emotion and SST5 datasets show that VDBA outperforms state-of-the-art methods, achieving an ASR improvement of 52.08%.
arXiv Detail & Related papers (2025-05-12T06:19:59Z)
Web Artifact Attacks Disrupt Vision Language Models [61.59021920232986]
Vision-language models (VLMs) are trained on large-scale, lightly curated web datasets. They learn unintended correlations between semantic concepts and unrelated visual signals. Prior work has weaponized these correlations as an attack vector to manipulate model predictions. We introduce artifact-based attacks: a novel class of manipulations that mislead models using both non-matching text and graphical elements.
arXiv Detail & Related papers (2025-03-17T18:59:29Z)
Undermining Image and Text Classification Algorithms Using Adversarial Attacks [0.0]
Our study addresses the gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models. Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy.
arXiv Detail & Related papers (2024-11-03T18:44:28Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks [62.34019142949628]
Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP. We introduce two novel and more effective textitSelf-Generated attacks which prompt the LVLM to generate an attack against itself. Using our benchmark, we uncover that Self-Generated attacks pose a significant threat, reducing LVLM(s) classification performance by up to 33%.
arXiv Detail & Related papers (2024-02-01T14:41:20Z)
Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems. GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z)
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models. AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z)
COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models [4.776465250559034]
We propose a prompt-based adversarial attack on manual templates in black box scenarios. First of all, we design character-level and word-level approaches to break manual templates separately. And we present a greedy algorithm for the attack based on the above destructive approaches.
arXiv Detail & Related papers (2023-06-09T03:53:42Z)
A Plot is Worth a Thousand Words: Model Information Stealing Attacks via Scientific Plots [14.998272283348152]
It is well known that an adversary can leverage a target ML model's output to steal the model's information. We propose a new side channel for model information stealing attacks, i.e., models' scientific plots.
arXiv Detail & Related papers (2023-02-23T12:57:34Z)
A Deep Marginal-Contrastive Defense against Adversarial Attacks on 1D Models [3.9962751777898955]
Deep learning algorithms have been recently targeted by attackers due to their vulnerability. Non-continuous deep models are still not robust against adversarial attacks. We propose a novel objective/loss function, which enforces the features to lie under a specified margin to facilitate their prediction.
arXiv Detail & Related papers (2020-12-08T20:51:43Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
Towards Class-Oriented Poisoning Attacks Against Neural Networks [1.14219428942199]
Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset. We propose a class-oriented poisoning attack that is capable of forcing the corrupted model to predict in two specific ways. To maximize the adversarial effect as well as reduce the computational complexity of poisoned data generation, we propose a gradient-based framework.
arXiv Detail & Related papers (2020-07-31T19:27:37Z)
Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks. DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.