Related papers: Attacking Attention of Foundation Models Disrupts Downstream Tasks

Attacking Attention of Foundation Models Disrupts Downstream Tasks

URL: http://arxiv.org/abs/2506.05394v2
Date: Mon, 09 Jun 2025 09:05:05 GMT
Title: Attacking Attention of Foundation Models Disrupts Downstream Tasks
Authors: Hondamunige Prasanna Silva, Federico Becattini, Lorenzo Seidenari,
Abstract summary: Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks.<n>These models are vulnerable to adversarial attacks.<n>This paper studies the vulnerabilities of vision foundation models, focusing specifically on CLIP and ViTs.<n>We introduce a novel attack, targeting the structure of transformer-based architectures in a task-agnostic fashion.
Score: 11.538345159297839
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models represent the most prominent and recent paradigm shift in artificial intelligence. Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks, often without fine-tuning. For this reason, models such as CLIP , DINO or Vision Transfomers (ViT), are becoming the bedrock of many industrial AI-powered applications. However, the reliance on pre-trained foundation models also introduces significant security concerns, as these models are vulnerable to adversarial attacks. Such attacks involve deliberately crafted inputs designed to deceive AI systems, jeopardizing their reliability. This paper studies the vulnerabilities of vision foundation models, focusing specifically on CLIP and ViTs, and explores the transferability of adversarial attacks to downstream tasks. We introduce a novel attack, targeting the structure of transformer-based architectures in a task-agnostic fashion. We demonstrate the effectiveness of our attack on several downstream tasks: classification, captioning, image/text retrieval, segmentation and depth estimation. Code available at:https://github.com/HondamunigePrasannaSilva/attack-attention

Related papers

Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z)
Task-Agnostic Attacks Against Vision Foundation Models [12.487589700031661]
It has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models.<n>The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored.<n>This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models.
arXiv Detail & Related papers (2025-03-05T19:15:14Z)
Concealed Adversarial attacks on neural networks for sequential data [2.1879059908547482]
We develop a concealed adversarial attack for different time-series models.<n>It provides more realistic perturbations, being hard to detect by a human or model discriminator.<n>Our findings highlight the growing challenge of designing robust time series models.
arXiv Detail & Related papers (2025-02-28T11:03:32Z)
Cross-Model Transferability of Adversarial Patches in Real-time Segmentation for Autonomous Driving [0.2120527246868857]
Adrial attacks pose a significant threat to deep learning models, particularly in safety-critical applications like healthcare and autonomous driving.<n>Recently, patch based attacks have demonstrated effectiveness in real-time inference scenarios owing to their 'drag and drop' nature.<n>Here we propose a novel Expectation Over Transformation (EOT) based adversarial patch attack that is more realistic for autonomous vehicles.
arXiv Detail & Related papers (2025-02-22T00:03:53Z)
Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing [21.52641337754884]
A type of adversarial attack can manipulate the behavior of machine learning models through contaminating their training dataset. We introduce our EDT model, an textbfEfficient, textbfData-free, textbfTraining-free backdoor attack method. Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models.
arXiv Detail & Related papers (2024-10-23T20:32:14Z)
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks. We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z)
An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape [11.45988746286973]
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. We study 8 state-of-the-art detectors and argue that they are far from being ready for deployment.
arXiv Detail & Related papers (2024-04-24T21:21:50Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Improving the Robustness of Object Detection and Classification AI models against Adversarial Patch Attacks [2.963101656293054]
We analyze attack techniques and propose a robust defense approach. We successfully reduce model confidence by over 20% using adversarial patch attacks that exploit object shape, texture and position. Our inpainting defense approach significantly enhances model resilience, achieving high accuracy and reliable localization despite the adversarial attacks.
arXiv Detail & Related papers (2024-03-04T13:32:48Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
Adversarial Attacks on Foundational Vision Models [6.5530318775587]
Rapid progress is being made in developing large, pretrained, task-agnostic foundational vision models. These models do not have to be finetuned downstream, and can simply be used in zero-shot or with a lightweight probing head. The goal of this work is to identify several key adversarial vulnerabilities of these models in an effort to make future designs more robust.
arXiv Detail & Related papers (2023-08-28T14:09:02Z)
MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks [65.86360607693457]
No-box attacks, where adversaries have no prior knowledge, remain relatively underexplored despite its practical relevance.<n>This work presents a systematic investigation into leveraging large-scale Vision-Language Models (VLMs) as surrogate models for executing no-box attacks.<n>Our theoretical and empirical analyses reveal a key limitation in the execution of no-box attacks stemming from insufficient discriminative capabilities for direct application of vanilla CLIP as a surrogate model.<n>We propose MF-CLIP: a novel framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization.
arXiv Detail & Related papers (2023-07-13T08:10:48Z)
Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples [68.5719552703438]
Segment Anything Model (SAM) has attracted significant attention recently, due to its impressive performance on various downstream tasks. Deep vision models are widely recognized as vulnerable to adversarial examples, which fool the model to make wrong predictions with imperceptible perturbation. This work is the first of its kind to conduct a comprehensive investigation on how to attack SAM with adversarial examples.
arXiv Detail & Related papers (2023-05-01T15:08:17Z)
Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.