Interpreting Adversarial Attacks and Defences using Architectures with Enhanced Interpretability
- URL: http://arxiv.org/abs/2502.15017v1
- Date: Thu, 20 Feb 2025 20:12:58 GMT
- Title: Interpreting Adversarial Attacks and Defences using Architectures with Enhanced Interpretability
- Authors: Akshay G Rao, Chandrashekhar Lakshminarayanan, Arun Rajkumar,
- Abstract summary: This work capitalizes on a network architecture, namely Deep Linearly Gated Networks (DLGN), which has better interpretation capabilities than regular deep network architectures.<n>We interpret robust models trained using PGD adversarial training and compare them with standard training.<n>We uncover insights into hyperplanes resembling principal components in PGD-AT and STD-TR models, with PGD-AT hyperplanes aligned farther from the data points.
- Score: 4.964101884577697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attacks in deep learning represent a significant threat to the integrity and reliability of machine learning models. Adversarial training has been a popular defence technique against these adversarial attacks. In this work, we capitalize on a network architecture, namely Deep Linearly Gated Networks (DLGN), which has better interpretation capabilities than regular deep network architectures. Using this architecture, we interpret robust models trained using PGD adversarial training and compare them with standard training. Feature networks in DLGN act as feature extractors, making them the only medium through which an adversary can attack the model. We analyze the feature network of DLGN with fully connected layers with respect to properties like alignment of the hyperplanes, hyperplane relation with PCA, and sub-network overlap among classes and compare these properties between robust and standard models. We also consider this architecture having CNN layers wherein we qualitatively (using visualizations) and quantitatively contrast gating patterns between robust and standard models. We uncover insights into hyperplanes resembling principal components in PGD-AT and STD-TR models, with PGD-AT hyperplanes aligned farther from the data points. We use path activity analysis to show that PGD-AT models create diverse, non-overlapping active subnetworks across classes, preventing attack-induced gating overlaps. Our visualization ideas show the nature of representations learnt by PGD-AT and STD-TR models.
Related papers
- Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures.
By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets.
Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z) - xIDS-EnsembleGuard: An Explainable Ensemble Learning-based Intrusion Detection System [7.2738577621227085]
We focus on addressing the challenges of detecting malicious attacks in networks by designing an advanced Explainable Intrusion Detection System (xIDS)
Existing machine learning and deep learning approaches have invisible limitations, such as potential biases in predictions, a lack of interpretability, and the risk of overfitting to training data.
We propose an ensemble learning technique called "EnsembleGuard" to overcome these challenges.
arXiv Detail & Related papers (2025-03-01T20:49:31Z) - Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models.
We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats.
Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z) - An Attentive Graph Agent for Topology-Adaptive Cyber Defence [1.0812794909131096]
We develop a custom version of the Cyber Operations Research Gym (CybORG) environment, encoding network state as a directed graph.<n>We employ a Graph Attention Network (GAT) architecture to process node, edge, and global features, and adapt its output to be compatible with policy gradient methods in reinforcement learning.<n>We demonstrate that GAT defensive policies can be trained using our low-level directed graph observations, even when unexpected connections arise during simulation.
arXiv Detail & Related papers (2025-01-24T18:22:37Z) - Stealing the Invisible: Unveiling Pre-Trained CNN Models through
Adversarial Examples and Timing Side-Channels [14.222432788661914]
We present an approach based on the observation that the classification patterns of adversarial images can be used as a means to steal the models.
Our approach exploits varying misclassifications of adversarial images across different models to fingerprint several renowned Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures.
arXiv Detail & Related papers (2024-02-19T08:47:20Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Interpretable Computer Vision Models through Adversarial Training:
Unveiling the Robustness-Interpretability Connection [0.0]
Interpretability is as essential as robustness when we deploy the models to the real world.
Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans.
arXiv Detail & Related papers (2023-07-04T13:51:55Z) - Common Knowledge Learning for Generating Transferable Adversarial
Examples [60.1287733223249]
This paper focuses on an important type of black-box attacks, where the adversary generates adversarial examples by a substitute (source) model.
Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures.
We propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples.
arXiv Detail & Related papers (2023-07-01T09:07:12Z) - Unveiling the potential of Graph Neural Networks for robust Intrusion
Detection [2.21481607673149]
We propose a novel Graph Neural Network (GNN) model to learn flow patterns of attacks structured as graphs.
Our model is able to maintain the same level of accuracy as in previous experiments, while state-of-the-art ML techniques degrade up to 50% their accuracy (F1-score) under adversarial attacks.
arXiv Detail & Related papers (2021-07-30T16:56:39Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z) - Rethinking Clustering for Robustness [56.14672993686335]
ClusTR is a clustering-based and adversary-free training framework to learn robust models.
textitClusTR outperforms adversarially-trained networks by up to $4%$ under strong PGD attacks.
arXiv Detail & Related papers (2020-06-13T16:55:51Z) - Boosting Adversarial Training with Hypersphere Embedding [53.75693100495097]
Adversarial training is one of the most effective defenses against adversarial attacks for deep learning models.
In this work, we advocate incorporating the hypersphere embedding mechanism into the AT procedure.
We validate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-20T08:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.