Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck
- URL: http://arxiv.org/abs/2511.21923v1
- Date: Wed, 26 Nov 2025 21:31:34 GMT
- Title: Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck
- Authors: Xinyu Liu, Xu Zhang, Can Chen, Ren Wang,
- Abstract summary: We find that backdoor attacks create unique mutual information (MI) signatures, which evolve across training phases and differ based on the attack mechanism.<n>We propose a novel, dynamics-based stealthiness metric that quantifies an attack's integration at the model level.
- Score: 11.299366048896843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how backdoor data influences neural network training dynamics remains a complex and underexplored challenge. In this paper, we present a rigorous analysis of the impact of backdoor data on the learning process, with a particular focus on the distinct behaviors between the target class and other clean classes. Leveraging the Information Bottleneck (IB) principle connected with clustering of internal representation, We find that backdoor attacks create unique mutual information (MI) signatures, which evolve across training phases and differ based on the attack mechanism. Our analysis uncovers a surprising trade-off: visually conspicuous attacks like BadNets can achieve high stealthiness from an information-theoretic perspective, integrating more seamlessly into the model than many visually imperceptible attacks. Building on these insights, we propose a novel, dynamics-based stealthiness metric that quantifies an attack's integration at the model level. We validate our findings and the proposed metric across multiple datasets and diverse attack types, offering a new dimension for understanding and evaluating backdoor threats. Our code is available in: https://github.com/XinyuLiu71/Information_Bottleneck_Backdoor.git.
Related papers
- Backdoor Attacks on Multi-modal Contrastive Learning [0.0]
This paper offers a thorough and comparative review of backdoor attacks in contrastive learning.<n>It analyzes threat models, attack methods, target domains, and available defenses.<n>Our findings have significant implications for the secure deployment of systems in industrial and distributed environments.
arXiv Detail & Related papers (2026-01-16T05:40:57Z) - Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation [9.961238260113916]
We introduce a novel class of backdoors that builds upon recent advancements in architectural backdoors.<n>We show that such attacks are not only feasible but also alarmingly effective.<n>We propose a deterministic mitigation strategy that provides formal guarantees against this new attack vector.
arXiv Detail & Related papers (2025-05-23T19:28:45Z) - Dullahan: Stealthy Backdoor Attack against Without-Label-Sharing Split Learning [29.842087372804905]
We propose a stealthy backdoor attack strategy tailored to the without-label-sharing split learning architecture.
Our SBAT achieves a higher level of attack stealthiness by refraining from modifying any intermediate parameters during training.
arXiv Detail & Related papers (2024-05-21T13:03:06Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Learning to Detect: A Data-driven Approach for Network Intrusion
Detection [17.288512506016612]
We perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks.
Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy.
We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks.
arXiv Detail & Related papers (2021-08-18T21:19:26Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Dynamic backdoor attacks against federated learning [0.5482532589225553]
Federated Learning (FL) is a new machine learning framework, which enables millions of participants to collaboratively train model without compromising data privacy and security.
In this paper, we focus on dynamic backdoor attacks under FL setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks.
To the best of our knowledge, this is the first paper that focus on dynamic backdoor attacks research under FL setting.
arXiv Detail & Related papers (2020-11-15T01:32:58Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.