ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models
- URL: http://arxiv.org/abs/2507.21985v1
- Date: Tue, 29 Jul 2025 16:36:01 GMT
- Title: ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models
- Authors: Hyun Jun Yook, Ga San Jhun, Jae Hyun Cho, Min Jeon, Donghyun Kim, Tae Hyung Kim, Youn Kyu Lee,
- Abstract summary: Adversarial prompts can exploit unlearned models to generate content containing removed concepts.<n>We propose ZIUM, a Zero-shot Intent-aware adversarial attack on Unlearned Models.<n>ZIUM supports zero-shot adversarial attacks without requiring further optimization for previously attacked unlearned concepts.
- Score: 4.6582927460321315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine unlearning (MU) removes specific data points or concepts from deep learning models to enhance privacy and prevent sensitive content generation. Adversarial prompts can exploit unlearned models to generate content containing removed concepts, posing a significant security risk. However, existing adversarial attack methods still face challenges in generating content that aligns with an attacker's intent while incurring high computational costs to identify successful prompts. To address these challenges, we propose ZIUM, a Zero-shot Intent-aware adversarial attack on Unlearned Models, which enables the flexible customization of target attack images to reflect an attacker's intent. Additionally, ZIUM supports zero-shot adversarial attacks without requiring further optimization for previously attacked unlearned concepts. The evaluation across various MU scenarios demonstrated ZIUM's effectiveness in successfully customizing content based on user-intent prompts while achieving a superior attack success rate compared to existing methods. Moreover, its zero-shot adversarial attack significantly reduces the attack time for previously attacked unlearned concepts.
Related papers
- The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z) - Erased but Not Forgotten: How Backdoors Compromise Concept Erasure [36.056298969999645]
We introduce a new threat model, Toxic Erasure (ToxE), and demonstrate how recent unlearning algorithms can be circumvented through targeted backdoor attacks.<n>For explicit content erasure, ToxE attacks can elicit up to 9 times more exposed body parts, with DISA yielding an average increase by a factor of 2.9.
arXiv Detail & Related papers (2025-04-29T16:13:06Z) - Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks [34.40254709148148]
Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding.
Their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks.
We present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update.
arXiv Detail & Related papers (2024-11-24T05:28:07Z) - Defense Against Prompt Injection Attack by Leveraging Attack Techniques [66.65466992544728]
Large language models (LLMs) have achieved remarkable performance across various natural language processing (NLP) tasks.<n>As LLMs continue to evolve, new vulnerabilities, especially prompt injection attacks arise.<n>Recent attack methods leverage LLMs' instruction-following abilities and their inabilities to distinguish instructions injected in the data content.
arXiv Detail & Related papers (2024-11-01T09:14:21Z) - A Practical Trigger-Free Backdoor Attack on Neural Networks [33.426207982772226]
We propose a trigger-free backdoor attack that does not require access to any training data.
Specifically, we design a novel fine-tuning approach that incorporates the concept of malicious data into the concept of the attacker-specified class.
The effectiveness, practicality, and stealthiness of the proposed attack are evaluated on three real-world datasets.
arXiv Detail & Related papers (2024-08-21T08:53:36Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Adversary Aware Continual Learning [3.3439097577935213]
Adversary can introduce small amount of misinformation to the model to cause deliberate forgetting of a specific task or class at test time.
We use the attacker's primary strength-hiding the backdoor pattern by making it imperceptible to humans-against it, and propose to learn a perceptible (stronger) pattern that can overpower the attacker's imperceptible pattern.
We show that our proposed defensive framework considerably improves the performance of class incremental learning algorithms with no knowledge of the attacker's target task, attacker's target class, and attacker's imperceptible pattern.
arXiv Detail & Related papers (2023-04-27T19:49:50Z) - Invisible Backdoor Attack with Dynamic Triggers against Person
Re-identification [71.80885227961015]
Person Re-identification (ReID) has rapidly progressed with wide real-world applications, but also poses significant risks of adversarial attacks.
We propose a novel backdoor attack on ReID under a new all-to-unknown scenario, called Dynamic Triggers Invisible Backdoor Attack (DT-IBA)
We extensively validate the effectiveness and stealthiness of the proposed attack on benchmark datasets, and evaluate the effectiveness of several defense methods against our attack.
arXiv Detail & Related papers (2022-11-20T10:08:28Z) - Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face
Recognition [111.1952945740271]
Adversarial Attributes (Adv-Attribute) is designed to generate inconspicuous and transferable attacks on face recognition.
Experiments on the FFHQ and CelebA-HQ datasets show that the proposed Adv-Attribute method achieves the state-of-the-art attacking success rates.
arXiv Detail & Related papers (2022-10-13T09:56:36Z) - Untargeted, Targeted and Universal Adversarial Attacks and Defenses on
Time Series [0.0]
We have performed untargeted, targeted and universal adversarial attacks on UCR time series datasets.
Our results show that deep learning based time series classification models are vulnerable to these attacks.
We also show that universal adversarial attacks have good generalization property as it need only a fraction of the training data.
arXiv Detail & Related papers (2021-01-13T13:00:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.