Related papers: A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification

A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification

URL: http://arxiv.org/abs/2412.16905v2
Date: Mon, 06 Jan 2025 14:42:37 GMT
Title: A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification
Authors: Yuan Ma, Xu Ma, Jiankang Wei, Jinmeng Tang, Xiaoyu Zhang, Yilun Lyu, Kehao Chen, Jingtong Huang,
Abstract summary: Traditional backdoor attacks involve injecting malicious samples with specific triggers into the training data.<n>More sophisticated attacks modify the model's architecture directly.<n>A novel backdoor attack method is presented in the paper.<n>It embeds the backdoor within the model's architecture and has the capability to generate inconspicuous and stealthy triggers.
Score: 12.393139669821869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning systems are vulnerable to backdoor attacks, where attackers manipulate model behavior through data tampering or architectural modifications. Traditional backdoor attacks involve injecting malicious samples with specific triggers into the training data, causing the model to produce targeted incorrect outputs in the presence of the corresponding triggers. More sophisticated attacks modify the model's architecture directly, embedding backdoors that are harder to detect as they evade traditional data-based detection methods. However, the drawback of the architectural modification based backdoor attacks is that the trigger must be visible in order to activate the backdoor. To further strengthen the invisibility of the backdoor attacks, a novel backdoor attack method is presented in the paper. To be more specific, this method embeds the backdoor within the model's architecture and has the capability to generate inconspicuous and stealthy triggers. The attack is implemented by modifying pre-trained models, which are then redistributed, thereby posing a potential threat to unsuspecting users. Comprehensive experiments conducted on standard computer vision benchmarks validate the effectiveness of this attack and highlight the stealthiness of its triggers, which remain undetectable through both manual visual inspection and advanced detection tools.

Related papers

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks [9.078969469946038]
Backdoor attacks pose a serious threat to the security of large language models.<n>We propose a backdoor detection method based on attention similarity.<n>Our method significantly reduces the success rate of backdoor attacks.
arXiv Detail & Related papers (2025-11-16T15:26:50Z)
Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images [0.0]
Backdoor attacks pose a critical threat by embedding hidden triggers into inputs, causing models to misclassify them into target labels.<n>We introduce a groundbreaking method to detect unseen backdoored images during both training and inference.<n>Our approach trains learnable text prompts to differentiate clean images from those with hidden backdoor triggers.
arXiv Detail & Related papers (2024-12-11T19:54:14Z)
Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models [6.937795040660591]
We introduce Deferred Activated Backdoor Functionality (DABF) as a new paradigm in backdoor attacks. Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered. DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment.
arXiv Detail & Related papers (2024-11-10T07:01:53Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
Backdoor Attack against One-Class Sequential Anomaly Detection Models [10.020488631167204]
We explore compromising deep sequential anomaly detection models by proposing a novel backdoor attack strategy. The attack approach comprises two primary steps, trigger generation and backdoor injection. Experiments demonstrate the effectiveness of our proposed attack strategy by injecting backdoors on two well-established one-class anomaly detection models.
arXiv Detail & Related papers (2024-02-15T19:19:54Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
INK: Inheritable Natural Backdoor Attack Against Model Distillation [8.937026844871074]
We introduce INK, an inheritable natural backdoor attack that targets model distillation. INK employs image variance as a backdoor trigger and enables both clean-image and clean-label attacks. For instance, INK maintains an attack success rate of over 98% post-distillation, compared to an average success rate of 1.4% for existing methods.
arXiv Detail & Related papers (2023-04-21T14:35:47Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area. We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.