Related papers: Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

URL: http://arxiv.org/abs/2307.08327v1
Date: Mon, 17 Jul 2023 08:50:36 GMT
Title: Analyzing the Impact of Adversarial Examples on Explainable Machine Learning
Authors: Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak
Abstract summary: Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems.
Score: 0.31498833540989407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack

Related papers

When Machine Learning Models Leak: An Exploration of Synthetic Training Data [0.0]
We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years. The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available. We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes.
arXiv Detail & Related papers (2023-10-12T23:47:22Z)
Attacks on Online Learners: a Teacher-Student Analysis [8.567831574941252]
We study the case of adversarial attacks on machine learning models in an online learning setting. We prove that a discontinuous transition in the learner's accuracy occurs when the attack strength exceeds a critical threshold. Our findings show that greedy attacks can be extremely efficient, especially when data stream in small batches.
arXiv Detail & Related papers (2023-05-18T17:26:03Z)
Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances. We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models. We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z)
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks [22.742818282850305]
camouflaged data poisoning attacks arise when model retraining may be induced. In particular, we consider clean-label targeted attacks on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset.
arXiv Detail & Related papers (2022-12-21T01:52:17Z)
Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data. Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples. Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
Adversarial Attack Attribution: Discovering Attributable Signals in Adversarial ML Attacks [0.7883722807601676]
Even production systems, such as self-driving cars and ML-as-a-service offerings, are susceptible to adversarial inputs. Can perturbed inputs be attributed to the methods used to generate the attack? We introduce the concept of adversarial attack attribution and create a simple supervised learning experimental framework to examine the feasibility of discovering attributable signals in adversarial attacks.
arXiv Detail & Related papers (2021-01-08T08:16:41Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. The structured output of structured prediction models is sensitive to small perturbations in the input. We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models. Current substitute attacks need pre-trained models to generate adversarial examples. In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.