CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models
- URL: http://arxiv.org/abs/2410.04823v1
- Date: Mon, 7 Oct 2024 08:14:17 GMT
- Title: CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models
- Authors: Songning Lai, Jiayu Yang, Yu Huang, Lijie Hu, Tianlang Xue, Zhangyi Hu, Jiaxu Li, Haicheng Liao, Yutao Yue,
- Abstract summary: Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information.
CBMs are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors.
We introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training.
An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers.
- Score: 8.236058439213473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the transformative impact of deep learning across multiple domains, the inherent opacity of these models has driven the development of Explainable Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information. However, CBMs, like other machine learning models, are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors. Understanding that the community has not yet studied the concept level backdoor attack of CBM, because of "Better the devil you know than the devil you don't know.", we introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training, enabling controlled manipulation of model predictions at inference time. An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers, thereby optimizing the attack's impact. Our comprehensive evaluation framework assesses both the attack success rate and stealthiness, demonstrating that CAT and CAT+ maintain high performance on clean data while achieving significant targeted effects on backdoored datasets. This work underscores the potential security risks associated with CBMs and provides a robust testing methodology for future security assessments.
Related papers
- Secure Transfer Learning: Training Clean Models Against Backdoor in (Both) Pre-trained Encoders and Downstream Datasets [16.619809695639027]
Pre-training and downstream adaptation expose models to sophisticated backdoor embeddings at both the encoder and dataset levels.
In this work, we investigate how to mitigate potential backdoor risks in resource-constrained transfer learning scenarios.
We propose the Trusted Core (T-Core) Bootstrapping framework, which emphasizes the importance of pinpointing trustworthy data and neurons to enhance model security.
arXiv Detail & Related papers (2025-04-16T11:33:03Z) - Toward Realistic Adversarial Attacks in IDS: A Novel Feasibility Metric for Transferability [0.0]
Transferability-based adversarial attacks exploit the ability of adversarial examples to deceive a specific source Intrusion Detection System (IDS) model.
These attacks exploit common vulnerabilities in machine learning models to bypass security measures and compromise systems.
This paper analyzes the core factors that contribute to transferability, including feature alignment, model architectural similarity, and overlap in the data distributions that each IDS examines.
arXiv Detail & Related papers (2025-04-11T12:15:03Z) - Human-AI Collaboration in Cloud Security: Cognitive Hierarchy-Driven Deep Reinforcement Learning [7.370441344885935]
Security Operations Centers (SOCs) must adopt AI-driven adaptive defense mechanisms to counter Advanced Persistent Threats (APTs)
We propose a Cognitive Hierarchy Theory-driven Deep Q-Network (CHT-DQN) framework that models interactive decision-making between SOC analysts and AI-driven APT bots.
Our framework enhances adaptive SOC defense using Attack Graph (AG)-based reinforcement learning.
arXiv Detail & Related papers (2025-02-22T03:19:21Z) - Robust Knowledge Distillation in Federated Learning: Counteracting Backdoor Attacks [12.227509826319267]
Federated Learning (FL) enables collaborative model training across multiple devices while preserving data privacy.
It remains susceptible to backdoor attacks, where malicious participants can compromise the global model.
We propose Robust Knowledge Distillation (RKD), a novel defence mechanism that enhances model integrity without relying on restrictive assumptions.
arXiv Detail & Related papers (2025-02-01T22:57:08Z) - Defensive Dual Masking for Robust Adversarial Defense [5.932787778915417]
This paper introduces the Defensive Dual Masking (DDM) algorithm, a novel approach designed to enhance model robustness against such attacks.
DDM utilizes a unique adversarial training strategy where [MASK] tokens are strategically inserted into training samples to prepare the model to handle adversarial perturbations more effectively.
During inference, potentially adversarial tokens are dynamically replaced with [MASK] tokens to neutralize potential threats while preserving the core semantics of the input.
arXiv Detail & Related papers (2024-12-10T00:41:25Z) - Behavior Backdoor for Deep Learning Models [95.50787731231063]
We take the first step towards behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure.
We propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack.
Experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack.
arXiv Detail & Related papers (2024-12-02T10:54:02Z) - A Practical Trigger-Free Backdoor Attack on Neural Networks [33.426207982772226]
We propose a trigger-free backdoor attack that does not require access to any training data.
Specifically, we design a novel fine-tuning approach that incorporates the concept of malicious data into the concept of the attacker-specified class.
The effectiveness, practicality, and stealthiness of the proposed attack are evaluated on three real-world datasets.
arXiv Detail & Related papers (2024-08-21T08:53:36Z) - PsybORG+: Modeling and Simulation for Detecting Cognitive Biases in Advanced Persistent Threats [10.161416622040722]
This work introduces PsybORG$+$, a multi-agent cybersecurity simulation environment designed to model APT behaviors influenced by cognitive vulnerabilities.
A classification model is built for cognitive vulnerability inference and a simulator is designed for synthetic data generation.
Results show that PsybORG$+$ can effectively model APT attackers with different loss aversion and confirmation bias levels.
arXiv Detail & Related papers (2024-08-02T15:00:58Z) - Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data [29.842087372804905]
This paper addresses the challenges of backdoor attack countermeasures in real-world scenarios.
We propose a robust and clean-data-free backdoor defense framework, namely Mellivora Capensis (textttMeCa), which enables the model trainer to train a clean model on the poisoned dataset.
arXiv Detail & Related papers (2024-05-21T12:20:19Z) - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge [17.3048898399324]
democratization of pre-trained language models through open-source initiatives has rapidly advanced innovation and expanded access to cutting-edge technologies.
backdoor attacks, where hidden malicious behaviors are triggered by specific inputs, compromising natural language processing (NLP) system integrity and reliability.
This paper suggests that merging a backdoored model with other homogeneous models can significantly remediate backdoor vulnerabilities.
arXiv Detail & Related papers (2024-02-29T16:37:08Z) - Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective [71.39995120597999]
Modern machine learning models are vulnerable to adversarial and backdoor attacks.
Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models.
CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
arXiv Detail & Related papers (2023-11-25T06:55:13Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z) - Stochastic Security: Adversarial Defense Using Long-Run Dynamics of
Energy-Based Models [82.03536496686763]
The vulnerability of deep networks to adversarial attacks is a central problem for deep learning from the perspective of both cognition and security.
We focus on defending naturally-trained classifiers using Markov Chain Monte Carlo (MCMC) sampling with an Energy-Based Model (EBM) for adversarial purification.
Our contributions are 1) an improved method for training EBM's with realistic long-run MCMC samples, 2) Expectation-Over-Transformation (EOT) defense that resolves theoretical ambiguities for defenses, and 3) state-of-the-art adversarial defense for naturally-trained classifiers and competitive defense.
arXiv Detail & Related papers (2020-05-27T17:53:36Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.