Measuring the Contribution of Multiple Model Representations in
Detecting Adversarial Instances
- URL: http://arxiv.org/abs/2111.07035v1
- Date: Sat, 13 Nov 2021 04:24:57 GMT
- Title: Measuring the Contribution of Multiple Model Representations in
Detecting Adversarial Instances
- Authors: Daniel Steinberg, Paul Munro
- Abstract summary: Our paper describes two approaches that incorporate representations from multiple models for detecting adversarial examples.
For many of the scenarios we consider, the results show that performance increases with the number of underlying models used for extracting representations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have been used for a wide variety of tasks. They are
prevalent in computer vision, natural language processing, speech recognition,
and other areas. While these models have worked well under many scenarios, it
has been shown that they are vulnerable to adversarial attacks. This has led to
a proliferation of research into ways that such attacks could be identified
and/or defended against. Our goal is to explore the contribution that can be
attributed to using multiple underlying models for the purpose of adversarial
instance detection. Our paper describes two approaches that incorporate
representations from multiple models for detecting adversarial examples. We
devise controlled experiments for measuring the detection impact of
incrementally utilizing additional models. For many of the scenarios we
consider, the results show that performance increases with the number of
underlying models used for extracting representations.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Adversarial Attacks and Dimensionality in Text Classifiers [3.4179091429029382]
Adversarial attacks on machine learning algorithms have been a key deterrent to the adoption of AI in many real-world use cases.
We study adversarial examples in the field of natural language processing, specifically text classification tasks.
arXiv Detail & Related papers (2024-04-03T11:49:43Z) - Adversarial Examples Detection with Enhanced Image Difference Features
based on Local Histogram Equalization [20.132066800052712]
We propose an adversarial example detection framework based on a high-frequency information enhancement strategy.
This framework can effectively extract and amplify the feature differences between adversarial examples and normal examples.
arXiv Detail & Related papers (2023-05-08T03:14:01Z) - Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data.
Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples.
Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z) - Detecting Adversarial Examples in Batches -- a geometrical approach [0.0]
We introduce two geometric metrics, density and coverage, and evaluate their use in detecting adversarial samples in batches of unseen data.
We empirically study these metrics using MNIST and two real-world biomedical datasets from MedMNIST, subjected to two different adversarial attacks.
Our experiments show promising results for both metrics to detect adversarial examples.
arXiv Detail & Related papers (2022-06-17T12:52:43Z) - Exploiting Multi-Object Relationships for Detecting Adversarial Attacks
in Complex Scenes [51.65308857232767]
Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples.
Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks.
We develop a novel approach to perform context consistency checks using language models.
arXiv Detail & Related papers (2021-08-19T00:52:10Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models.
We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.