Analyzing the Impact of Adversarial Examples on Explainable Machine
Learning
- URL: http://arxiv.org/abs/2307.08327v1
- Date: Mon, 17 Jul 2023 08:50:36 GMT
- Title: Analyzing the Impact of Adversarial Examples on Explainable Machine
Learning
- Authors: Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu
Chandan Nayak
- Abstract summary: Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions.
Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to.
In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems.
- Score: 0.31498833540989407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attacks are a type of attack on machine learning models where an
attacker deliberately modifies the inputs to cause the model to make incorrect
predictions. Adversarial attacks can have serious consequences, particularly in
applications such as autonomous vehicles, medical diagnosis, and security
systems. Work on the vulnerability of deep learning models to adversarial
attacks has shown that it is very easy to make samples that make a model
predict things that it doesn't want to. In this work, we analyze the impact of
model interpretability due to adversarial attacks on text classification
problems. We develop an ML-based classification model for text data. Then, we
introduce the adversarial perturbations on the text data to understand the
classification performance after the attack. Subsequently, we analyze and
interpret the model's explainability before and after the attack
Related papers
- When Machine Learning Models Leak: An Exploration of Synthetic Training Data [0.0]
We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years.
The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available.
We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes.
arXiv Detail & Related papers (2023-10-12T23:47:22Z) - Attacks on Online Learners: a Teacher-Student Analysis [8.567831574941252]
We study the case of adversarial attacks on machine learning models in an online learning setting.
We prove that a discontinuous transition in the learner's accuracy occurs when the attack strength exceeds a critical threshold.
Our findings show that greedy attacks can be extremely efficient, especially when data stream in small batches.
arXiv Detail & Related papers (2023-05-18T17:26:03Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks [22.742818282850305]
camouflaged data poisoning attacks arise when model retraining may be induced.
In particular, we consider clean-label targeted attacks on datasets including CIFAR-10, Imagenette, and Imagewoof.
This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset.
arXiv Detail & Related papers (2022-12-21T01:52:17Z) - Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data.
Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples.
Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Adversarial Attack Attribution: Discovering Attributable Signals in
Adversarial ML Attacks [0.7883722807601676]
Even production systems, such as self-driving cars and ML-as-a-service offerings, are susceptible to adversarial inputs.
Can perturbed inputs be attributed to the methods used to generate the attack?
We introduce the concept of adversarial attack attribution and create a simple supervised learning experimental framework to examine the feasibility of discovering attributable signals in adversarial attacks.
arXiv Detail & Related papers (2021-01-08T08:16:41Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.