Derivation of Information-Theoretically Optimal Adversarial Attacks with
Applications to Robust Machine Learning
- URL: http://arxiv.org/abs/2007.14042v1
- Date: Tue, 28 Jul 2020 07:45:25 GMT
- Title: Derivation of Information-Theoretically Optimal Adversarial Attacks with
Applications to Robust Machine Learning
- Authors: Jirong Yi, Raghu Mudumbai, Weiyu Xu
- Abstract summary: We consider the theoretical problem of designing an optimal adversarial attack on a decision system.
We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest.
We show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available.
- Score: 11.206758778146288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the theoretical problem of designing an optimal adversarial
attack on a decision system that maximally degrades the achievable performance
of the system as measured by the mutual information between the degraded signal
and the label of interest. This problem is motivated by the existence of
adversarial examples for machine learning classifiers. By adopting an
information theoretic perspective, we seek to identify conditions under which
adversarial vulnerability is unavoidable i.e. even optimally designed
classifiers will be vulnerable to small adversarial perturbations. We present
derivations of the optimal adversarial attacks for discrete and continuous
signals of interest, i.e., finding the optimal perturbation distributions to
minimize the mutual information between the degraded signal and a signal
following a continuous or discrete distribution. In addition, we show that it
is much harder to achieve adversarial attacks for minimizing mutual information
when multiple redundant copies of the input signal are available. This provides
additional support to the recently proposed ``feature compression" hypothesis
as an explanation for the adversarial vulnerability of deep learning
classifiers. We also report on results from computational experiments to
illustrate our theoretical results.
Related papers
- Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning [49.417414031031264]
This paper studies learning fair encoders in a self-supervised learning setting.
All data are unlabeled and only a small portion of them are annotated with sensitive attributes.
arXiv Detail & Related papers (2024-06-09T08:11:12Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph.
The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Robust Machine Learning via Privacy/Rate-Distortion Theory [34.28921458311185]
Robust machine learning formulations have emerged to address the prevalent vulnerability of deep neural networks to adversarial examples.
Our work draws the connection between optimal robust learning and the privacy-utility tradeoff problem, which is a generalization of the rate-distortion problem.
This information-theoretic perspective sheds light on the fundamental tradeoff between robustness and clean data performance.
arXiv Detail & Related papers (2020-07-22T21:34:59Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Learning Adversarially Robust Representations via Worst-Case Mutual
Information Maximization [15.087280646796527]
Training machine learning models that are robust against adversarial inputs poses seemingly insurmountable challenges.
We develop a notion of representation vulnerability that captures the maximum change of mutual information between the input and output distributions.
We propose an unsupervised learning method for obtaining intrinsically robust representations by maximizing the worst-case mutual information.
arXiv Detail & Related papers (2020-02-26T21:20:40Z) - Guess First to Enable Better Compression and Adversarial Robustness [5.579029325265822]
We propose a bio-inspired classification framework in which model inference is conditioned on label hypothesis.
We provide a class of training objectives for this framework and an information bottleneck regularizer.
Better compression and elimination of label information further bring better adversarial robustness without loss of natural accuracy.
arXiv Detail & Related papers (2020-01-10T05:12:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.