Related papers: Attack-Centric Approach for Evaluating Transferability of Adversarial Samples in Machine Learning Models

Attack-Centric Approach for Evaluating Transferability of Adversarial Samples in Machine Learning Models

URL: http://arxiv.org/abs/2112.01777v1
Date: Fri, 3 Dec 2021 08:24:23 GMT
Title: Attack-Centric Approach for Evaluating Transferability of Adversarial Samples in Machine Learning Models
Authors: Tochukwu Idika and Ismail Akturk
Abstract summary: Knowing factors that influence transferability of adversarial samples can assist experts to make informed decisions on how to build robust and reliable machine learning systems. This study is to provide insights on the mechanisms behind the transferability of adversarial samples through an attack-centric approach.
Score: 0.30458514384586394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transferability of adversarial samples became a serious concern due to their impact on the reliability of machine learning system deployments, as they find their way into many critical applications. Knowing factors that influence transferability of adversarial samples can assist experts to make informed decisions on how to build robust and reliable machine learning systems. The goal of this study is to provide insights on the mechanisms behind the transferability of adversarial samples through an attack-centric approach. This attack-centric perspective interprets how adversarial samples would transfer by assessing the impact of machine learning attacks (that generated them) on a given input dataset. To achieve this goal, we generated adversarial samples using attacker models and transferred these samples to victim models. We analyzed the behavior of adversarial samples on victim models and outlined four factors that can influence the transferability of adversarial samples. Although these factors are not necessarily exhaustive, they provide useful insights to researchers and practitioners of machine learning systems.

Related papers

Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z)
Toward Realistic Adversarial Attacks in IDS: A Novel Feasibility Metric for Transferability [0.0]
Transferability-based adversarial attacks exploit the ability of adversarial examples to deceive a specific source Intrusion Detection System (IDS) model. These attacks exploit common vulnerabilities in machine learning models to bypass security measures and compromise systems. This paper analyzes the core factors that contribute to transferability, including feature alignment, model architectural similarity, and overlap in the data distributions that each IDS examines.
arXiv Detail & Related papers (2025-04-11T12:15:03Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability [24.272384832200522]
We propose a novel approach that directly constructs adversarial examples by extracting transferable features across various tasks. Specifically, we train a sequence-to-sequence generative model named CT-GAT using adversarial sample data collected from multiple tasks to acquire universal adversarial features. Results demonstrate that our method achieves superior attack performance with small cost.
arXiv Detail & Related papers (2023-10-22T11:00:04Z)
Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models. This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks. We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z)
Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data. Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples. Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z)
Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors. We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z)
Adversarial Robustness of Deep Reinforcement Learning based Dynamic Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems. We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors. Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z)
When and How to Fool Explainable Models (and Humans) with Adversarial Examples [1.439518478021091]
We explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios. Next, we propose a comprehensive framework to study whether adversarial examples can be generated for explainable models.
arXiv Detail & Related papers (2021-07-05T11:20:55Z)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.