Adversarial Evaluation of Multimodal Models under Realistic Gray Box
Assumption
- URL: http://arxiv.org/abs/2011.12902v3
- Date: Wed, 9 Jun 2021 16:23:04 GMT
- Title: Adversarial Evaluation of Multimodal Models under Realistic Gray Box
Assumption
- Authors: Ivan Evtimov, Russel Howes, Brian Dolhansky, Hamed Firooz, Cristian
Canton Ferrer
- Abstract summary: This work examines the vulnerability of multimodal (image + text) models to adversarial threats similar to those discussed in previous literature on unimodal (image- or text-only) models.
We introduce realistic assumptions of partial model knowledge and access, and discuss how these assumptions differ from the standard "black-box"/"white-box" dichotomy common in current literature on adversarial attacks.
- Score: 8.97147332560535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work examines the vulnerability of multimodal (image + text) models to
adversarial threats similar to those discussed in previous literature on
unimodal (image- or text-only) models. We introduce realistic assumptions of
partial model knowledge and access, and discuss how these assumptions differ
from the standard "black-box"/"white-box" dichotomy common in current
literature on adversarial attacks. Working under various levels of these
"gray-box" assumptions, we develop new attack methodologies unique to
multimodal classification and evaluate them on the Hateful Memes Challenge
classification task. We find that attacking multiple modalities yields stronger
attacks than unimodal attacks alone (inducing errors in up to 73% of cases),
and that the unimodal image attacks on multimodal classifiers we explored were
stronger than character-based text augmentation attacks (inducing errors on
average in 45% and 30% of cases, respectively).
Related papers
- BadCM: Invisible Backdoor Attack Against Cross-Modal Learning [110.37205323355695]
We introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor.
BadCM is the first invisible backdoor method deliberately designed for diverse cross-modal attacks within one unified framework.
arXiv Detail & Related papers (2024-10-03T03:51:53Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models [34.802736332993994]
We propose MMCert, the first certified defense against adversarial attacks to a multi-modal model.
We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task.
arXiv Detail & Related papers (2024-03-28T01:05:06Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted
Attacks [30.863450425927613]
We study the black-box targeted attack problem from the model discrepancy perspective.
We present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack.
We derive a new algorithm for black-box targeted attacks based on our theoretical analysis.
arXiv Detail & Related papers (2022-12-18T08:19:08Z) - Data Poisoning Attacks Against Multimodal Encoders [24.02062380303139]
We study poisoning attacks against multimodal models in both visual and linguistic modalities.
To mitigate the attacks, we propose both pre-training and post-training defenses.
arXiv Detail & Related papers (2022-09-30T06:50:08Z) - Cross-Modal Transferable Adversarial Attacks from Images to Videos [82.0745476838865]
Recent studies have shown that adversarial examples hand-crafted on one white-box model can be used to attack other black-box models.
We propose a simple yet effective cross-modal attack method, named as Image To Video (I2V) attack.
I2V generates adversarial frames by minimizing the cosine similarity between features of pre-trained image models from adversarial and benign examples.
arXiv Detail & Related papers (2021-12-10T08:19:03Z) - Multi-granularity Textual Adversarial Attack with Behavior Cloning [4.727534308759158]
We propose MAYA, a Multi-grAnularitY Attack model to generate high-quality adversarial samples with fewer queries to victim models.
We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets.
arXiv Detail & Related papers (2021-09-09T15:46:45Z) - Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed.
In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model.
We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.