Explaining Adversarial Vulnerability with a Data Sparsity Hypothesis
- URL: http://arxiv.org/abs/2103.00778v1
- Date: Mon, 1 Mar 2021 06:04:31 GMT
- Title: Explaining Adversarial Vulnerability with a Data Sparsity Hypothesis
- Authors: Mahsa Paknezhad, Cuong Phuc Ngo, Amadeus Aristo Winarto, Alistair
Cheong, Beh Chuen Yang, Wu Jiayang, Lee Hwee Kuan
- Abstract summary: deep learning models are susceptible to adversarial attacks.
In this paper, we develop a training framework for DL models to learn such decision boundaries.
We measure adversarial robustness of the models trained using this training framework against well-known adversarial attacks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite many proposed algorithms to provide robustness to deep learning (DL)
models, DL models remain susceptible to adversarial attacks. We hypothesize
that the adversarial vulnerability of DL models stems from two factors. The
first factor is data sparsity which is that in the high dimensional data space,
there are large regions outside the support of the data distribution. The
second factor is the existence of many redundant parameters in the DL models.
Owing to these factors, different models are able to come up with different
decision boundaries with comparably high prediction accuracy. The appearance of
the decision boundaries in the space outside the support of the data
distribution does not affect the prediction accuracy of the model. However,
they make an important difference in the adversarial robustness of the model.
We propose that the ideal decision boundary should be as far as possible from
the support of the data distribution.\par In this paper, we develop a training
framework for DL models to learn such decision boundaries spanning the space
around the class distributions further from the data points themselves.
Semi-supervised learning was deployed to achieve this objective by leveraging
unlabeled data generated in the space outside the support of the data
distribution. We measure adversarial robustness of the models trained using
this training framework against well-known adversarial attacks We found that
our results, other regularization methods and adversarial training also support
our hypothesis of data sparcity. We show that the unlabeled data generated by
noise using our framework is almost as effective as unlabeled data, sourced
from existing data sets or generated by synthesis algorithms, on adversarial
robustness. Our code is available at
https://github.com/MahsaPaknezhad/AdversariallyRobustTraining.
Related papers
- Robust training of implicit generative models for multivariate and heavy-tailed distributions with an invariant statistical loss [0.4249842620609682]
We build on the textitinvariant statistical loss (ISL) method introduced in citede2024training.
We extend it to handle heavy-tailed and multivariate data distributions.
We assess its performance in generative generative modeling and explore its potential as a pretraining technique for generative adversarial networks (GANs)
arXiv Detail & Related papers (2024-10-29T10:27:50Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying [10.919336198760808]
We introduce a novel methodology to detect leaked data that are used to train classification models.
textscLDSS involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset.
This enables the effective identification of models trained on leaked data through model querying alone.
arXiv Detail & Related papers (2023-10-06T10:36:28Z) - Overcoming Overconfidence for Active Learning [1.2776312584227847]
We present two novel methods to address the problem of overconfidence that arises in the active learning scenario.
The first is an augmentation strategy named Cross-Mix-and-Mix (CMaM), which aims to calibrate the model by expanding the limited training distribution.
The second is a selection strategy named Ranked Margin Sampling (RankedMS), which prevents choosing data that leads to overly confident predictions.
arXiv Detail & Related papers (2023-08-21T09:04:54Z) - Topological Interpretability for Deep-Learning [0.30806551485143496]
Deep learning (DL) models cannot quantify the certainty of their predictions.
This work presents a method to infer prominent features in two DL classification models trained on clinical and non-clinical text.
arXiv Detail & Related papers (2023-05-15T13:38:13Z) - Two-Stage Robust and Sparse Distributed Statistical Inference for
Large-Scale Data [18.34490939288318]
We address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers.
We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity.
arXiv Detail & Related papers (2022-08-17T11:17:47Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.