CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models
- URL: http://arxiv.org/abs/2410.04823v1
- Date: Mon, 7 Oct 2024 08:14:17 GMT
- Title: CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models
- Authors: Songning Lai, Jiayu Yang, Yu Huang, Lijie Hu, Tianlang Xue, Zhangyi Hu, Jiaxu Li, Haicheng Liao, Yutao Yue,
- Abstract summary: Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information.
CBMs are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors.
We introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training.
An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers.
- Score: 8.236058439213473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the transformative impact of deep learning across multiple domains, the inherent opacity of these models has driven the development of Explainable Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information. However, CBMs, like other machine learning models, are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors. Understanding that the community has not yet studied the concept level backdoor attack of CBM, because of "Better the devil you know than the devil you don't know.", we introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training, enabling controlled manipulation of model predictions at inference time. An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers, thereby optimizing the attack's impact. Our comprehensive evaluation framework assesses both the attack success rate and stealthiness, demonstrating that CAT and CAT+ maintain high performance on clean data while achieving significant targeted effects on backdoored datasets. This work underscores the potential security risks associated with CBMs and provides a robust testing methodology for future security assessments.
Related papers
- Robust Knowledge Distillation in Federated Learning: Counteracting Backdoor Attacks [12.227509826319267]
Federated Learning (FL) enables collaborative model training across multiple devices while preserving data privacy.
It remains susceptible to backdoor attacks, where malicious participants can compromise the global model.
We propose Robust Knowledge Distillation (RKD), a novel defence mechanism that enhances model integrity without relying on restrictive assumptions.
arXiv Detail & Related papers (2025-02-01T22:57:08Z) - Defensive Dual Masking for Robust Adversarial Defense [5.932787778915417]
This paper introduces the Defensive Dual Masking (DDM) algorithm, a novel approach designed to enhance model robustness against such attacks.
DDM utilizes a unique adversarial training strategy where [MASK] tokens are strategically inserted into training samples to prepare the model to handle adversarial perturbations more effectively.
During inference, potentially adversarial tokens are dynamically replaced with [MASK] tokens to neutralize potential threats while preserving the core semantics of the input.
arXiv Detail & Related papers (2024-12-10T00:41:25Z) - Behavior Backdoor for Deep Learning Models [95.50787731231063]
We take the first step towards behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure.
We propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack.
Experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack.
arXiv Detail & Related papers (2024-12-02T10:54:02Z) - A Practical Trigger-Free Backdoor Attack on Neural Networks [33.426207982772226]
We propose a trigger-free backdoor attack that does not require access to any training data.
Specifically, we design a novel fine-tuning approach that incorporates the concept of malicious data into the concept of the attacker-specified class.
The effectiveness, practicality, and stealthiness of the proposed attack are evaluated on three real-world datasets.
arXiv Detail & Related papers (2024-08-21T08:53:36Z) - PsybORG+: Modeling and Simulation for Detecting Cognitive Biases in Advanced Persistent Threats [10.161416622040722]
This work introduces PsybORG$+$, a multi-agent cybersecurity simulation environment designed to model APT behaviors influenced by cognitive vulnerabilities.
A classification model is built for cognitive vulnerability inference and a simulator is designed for synthetic data generation.
Results show that PsybORG$+$ can effectively model APT attackers with different loss aversion and confirmation bias levels.
arXiv Detail & Related papers (2024-08-02T15:00:58Z) - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective [71.39995120597999]
Modern machine learning models are vulnerable to adversarial and backdoor attacks.
Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models.
CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
arXiv Detail & Related papers (2023-11-25T06:55:13Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.