FACADE: A Framework for Adversarial Circuit Anomaly Detection and
Evaluation
- URL: http://arxiv.org/abs/2307.10563v1
- Date: Thu, 20 Jul 2023 04:00:37 GMT
- Title: FACADE: A Framework for Adversarial Circuit Anomaly Detection and
Evaluation
- Authors: Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi
Koyejo
- Abstract summary: FACADE is designed for unsupervised mechanistic anomaly detection in deep neural networks.
Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
- Score: 9.025997629442896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present FACADE, a novel probabilistic and geometric framework designed for
unsupervised mechanistic anomaly detection in deep neural networks. Its primary
goal is advancing the understanding and mitigation of adversarial attacks.
FACADE aims to generate probabilistic distributions over circuits, which
provide critical insights to their contribution to changes in the manifold
properties of pseudo-classes, or high-dimensional modes in activation space,
yielding a powerful tool for uncovering and combating adversarial attacks. Our
approach seeks to improve model robustness, enhance scalable model oversight,
and demonstrates promising applications in real-world deployment settings.
Related papers
- ExAL: An Exploration Enhanced Adversarial Learning Algorithm [0.0]
We propose a novel Exploration-enhanced Adversarial Learning Algorithm (ExAL)
ExAL integrates exploration-driven mechanisms to discover perturbations that maximize impact on the model's decision boundary.
We evaluate the performance of ExAL on the MNIST Handwritten Digits and Blended Malware datasets.
arXiv Detail & Related papers (2024-11-24T15:37:29Z) - Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks [0.0]
Adversarial attacks pose significant threats to the robustness of deep learning models in image classification.
This paper explores and refines defense mechanisms against these attacks to enhance the resilience of neural networks.
arXiv Detail & Related papers (2024-08-20T02:00:02Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - ADVREPAIR:Provable Repair of Adversarial Attack [15.580097790702508]
Deep neural networks (DNNs) are increasingly deployed in safety-critical domains, but their vulnerability to adversarial attacks poses serious safety risks.
Existing neuron-level methods using limited data lack efficacy in fixing adversaries due to the complexity of adversarial attack mechanisms.
We propose ADVREPAIR, a novel approach for provable repair of adversarial attacks using limited data.
arXiv Detail & Related papers (2024-04-02T05:16:59Z) - Problem space structural adversarial attacks for Network Intrusion Detection Systems based on Graph Neural Networks [8.629862888374243]
We propose the first formalization of adversarial attacks specifically tailored for GNN in network intrusion detection.
We outline and model the problem space constraints that attackers need to consider to carry out feasible structural attacks in real-world scenarios.
Our findings demonstrate the increased robustness of the models against classical feature-based adversarial attacks.
arXiv Detail & Related papers (2024-03-18T14:40:33Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.