As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
- URL: http://arxiv.org/abs/2403.12693v1
- Date: Tue, 19 Mar 2024 12:51:39 GMT
- Title: As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
- Authors: Anjun Hu, Jindong Gu, Francesco Pinto, Konstantinos Kamnitsas, Philip Torr,
- Abstract summary: We show that foundation models pre-trained on web-scale vision-language data can serve as a basis for attacking downstream systems.
We propose a simple yet effective adversarial attack strategy termed Patch Representation Misalignment.
Our findings highlight the concerning safety risks introduced by the extensive usage of public foundational models in the development of downstream systems.
- Score: 23.660089146157507
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems. While pre-training offers clear advantages for downstream learning, it also endows downstream models with shared adversarial vulnerabilities that can be easily identified through the open-sourced foundation model. In this work, we expose such vulnerabilities in CLIP's downstream models and show that foundation models can serve as a basis for attacking their downstream systems. In particular, we propose a simple yet effective adversarial attack strategy termed Patch Representation Misalignment (PRM). Solely based on open-sourced CLIP vision encoders, this method produces adversaries that simultaneously fool more than 20 downstream models spanning 4 common vision-language tasks (semantic segmentation, object detection, image captioning and visual question-answering). Our findings highlight the concerning safety risks introduced by the extensive usage of public foundational models in the development of downstream systems, calling for extra caution in these scenarios.
Related papers
- Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z) - Attacking Attention of Foundation Models Disrupts Downstream Tasks [11.538345159297839]
Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks.<n>These models are vulnerable to adversarial attacks.<n>This paper studies the vulnerabilities of vision foundation models, focusing specifically on CLIP and ViTs.<n>We introduce a novel attack, targeting the structure of transformer-based architectures in a task-agnostic fashion.
arXiv Detail & Related papers (2025-06-03T19:42:48Z) - Task-Agnostic Attacks Against Vision Foundation Models [12.487589700031661]
It has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models.
The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored.
This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models.
arXiv Detail & Related papers (2025-03-05T19:15:14Z) - Memory Backdoor Attacks on Neural Networks [3.2720947374803777]
We propose the memory backdoor attack, where a model is covertly trained to specific training samples and later selectively output them.
We demonstrate the attack on image classifiers, segmentation models, and a large language model (LLM)
arXiv Detail & Related papers (2024-11-21T16:09:16Z) - Edge-Only Universal Adversarial Attacks in Distributed Learning [49.546479320670464]
In this work, we explore the feasibility of generating universal adversarial attacks when an attacker has access to the edge part of the model only.
Our approach shows that adversaries can induce effective mispredictions in the unknown cloud part by leveraging key features on the edge side.
Our results on ImageNet demonstrate strong attack transferability to the unknown cloud part.
arXiv Detail & Related papers (2024-11-15T11:06:24Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Stealth edits to large language models [76.53356051271014]
We show that a single metric can be used to assess a model's editability.
We also reveal the vulnerability of language models to stealth attacks.
arXiv Detail & Related papers (2024-06-18T14:43:18Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Adversarial Attacks on Foundational Vision Models [6.5530318775587]
Rapid progress is being made in developing large, pretrained, task-agnostic foundational vision models.
These models do not have to be finetuned downstream, and can simply be used in zero-shot or with a lightweight probing head.
The goal of this work is to identify several key adversarial vulnerabilities of these models in an effort to make future designs more robust.
arXiv Detail & Related papers (2023-08-28T14:09:02Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - On the Robustness of Deep Clustering Models: Adversarial Attacks and
Defenses [14.951655356042947]
Clustering models constitute a class of unsupervised machine learning methods which are used in a number of application pipelines.
We propose a blackbox attack using Generative Adversarial Networks (GANs) where the adversary does not know which deep clustering model is being used.
We analyze our attack against multiple state-of-the-art deep clustering models and real-world datasets, and find that it is highly successful.
arXiv Detail & Related papers (2022-10-04T22:32:02Z) - On the Evaluation of User Privacy in Deep Neural Networks using Timing
Side Channel [14.350301915592027]
We identify and report a novel data-dependent timing side-channel leakage (termed Class Leakage) in Deep Learning (DL) implementations.
We demonstrate a practical inference-time attack where an adversary with user privilege and hard-label blackbox access to an ML can exploit Class Leakage.
We develop an easy-to-implement countermeasure by making a constant-time branching operation that alleviates the Class Leakage.
arXiv Detail & Related papers (2022-08-01T19:38:16Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.