Exploring the Connection between Robust and Generative Models
- URL: http://arxiv.org/abs/2304.04033v4
- Date: Mon, 5 Jun 2023 15:23:05 GMT
- Title: Exploring the Connection between Robust and Generative Models
- Authors: Senad Beadini and Iacopo Masi
- Abstract summary: We show that adversarial points in the input space are very likely under the generative model hidden inside the discriminative classifier.
We present two evidence: untargeted attacks are even more likely than the natural data and their likelihood increases as the attack strength increases.
This allows us to easily detect them and craft a novel attack called High-Energy PGD that fools the classifier yet has energy similar to the data set.
- Score: 7.670469205851674
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We offer a study that connects robust discriminative classifiers trained with
adversarial training (AT) with generative modeling in the form of Energy-based
Models (EBM). We do so by decomposing the loss of a discriminative classifier
and showing that the discriminative model is also aware of the input data
density. Though a common assumption is that adversarial points leave the
manifold of the input data, our study finds out that, surprisingly, untargeted
adversarial points in the input space are very likely under the generative
model hidden inside the discriminative classifier -- have low energy in the
EBM. We present two evidence: untargeted attacks are even more likely than the
natural data and their likelihood increases as the attack strength increases.
This allows us to easily detect them and craft a novel attack called
High-Energy PGD that fools the classifier yet has energy similar to the data
set. The code is available at github.com/senad96/Robust-Generative
Related papers
- Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation [18.149950949071982]
We study the energy distribution differences of clean, adversarial, and generated samples across various JEM variants and adversarially trained models.<n>We propose Energy-based Joint Distribution Adrialversa Training to jointly model the clean data distribution, the adversarial distribution, and the classifier.
arXiv Detail & Related papers (2025-05-26T03:26:55Z) - A Study on Adversarial Robustness of Discriminative Prototypical Learning [0.24999074238880484]
We propose a novel adversarial training framework named Adversarial Deep Positive-Negative Prototypes (Adv-DPNP)
Adv-DPNP integrates disriminative prototype-based learning with adversarial training.
Our approach utilizes a composite loss function combining positive prototype alignment, negative prototype repulsion, and consistency regularization.
arXiv Detail & Related papers (2025-04-03T15:42:58Z) - MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective [0.0]
In adversarial machine learning, malicious users try to fool the deep learning model by inserting adversarial perturbation inputs into the model during its training or testing phase.
We empirically test the black-box adversarial transferability phenomena in cyber attack detection systems.
The results indicate that any deep learning model is highly susceptible to adversarial attacks, even if the attacker does not have access to the internal details of the target model.
arXiv Detail & Related papers (2024-04-15T06:56:28Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness [39.883465335244594]
We show that concentration on small-volume subsets of the input space determines whether a robust classifier exists.
We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, utilizing structure in data naturally leads to classifiers that enjoy data-dependent polyhedral guarantees.
arXiv Detail & Related papers (2023-09-28T01:39:47Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Mitigating the Impact of Adversarial Attacks in Very Deep Networks [10.555822166916705]
Deep Neural Network (DNN) models have vulnerabilities related to security concerns.
Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models.
We propose an attack-agnostic-based defense method for mitigating their influence.
arXiv Detail & Related papers (2020-12-08T21:25:44Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.