Input Validation for Neural Networks via Runtime Local Robustness
Verification
- URL: http://arxiv.org/abs/2002.03339v2
- Date: Tue, 13 Feb 2024 05:09:49 GMT
- Title: Input Validation for Neural Networks via Runtime Local Robustness
Verification
- Authors: Jiangchao Liu, Liqian Chen, Antoine Mine and Ji Wang
- Abstract summary: We propose to validate inputs for neural networks via runtime local robustness verification.
Experiments show that our approach can protect neural networks from adversarial examples and improve their accuracies.
- Score: 3.4090949470658014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local robustness verification can verify that a neural network is robust wrt.
any perturbation to a specific input within a certain distance. We call this
distance Robustness Radius. We observe that the robustness radii of correctly
classified inputs are much larger than that of misclassified inputs which
include adversarial examples, especially those from strong adversarial attacks.
Another observation is that the robustness radii of correctly classified inputs
often follow a normal distribution. Based on these two observations, we propose
to validate inputs for neural networks via runtime local robustness
verification. Experiments show that our approach can protect neural networks
from adversarial examples and improve their accuracies.
Related papers
- Verified Neural Compressed Sensing [58.98637799432153]
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task.
We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements.
We show that the complexity of the network can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
arXiv Detail & Related papers (2024-05-07T12:20:12Z) - Trust, but Verify: Robust Image Segmentation using Deep Learning [7.220625464268644]
We describe a method for verifying the output of a deep neural network for medical image segmentation.
We show that previous methods for segmentation evaluation that do use deep neural regression networks are vulnerable to false negatives.
arXiv Detail & Related papers (2023-10-25T20:55:07Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Can pruning improve certified robustness of neural networks? [106.03070538582222]
We show that neural network pruning can improve empirical robustness of deep neural networks (NNs)
Our experiments show that by appropriately pruning an NN, its certified accuracy can be boosted up to 8.2% under standard training.
We additionally observe the existence of certified lottery tickets that can match both standard and certified robust accuracies of the original dense models.
arXiv Detail & Related papers (2022-06-15T05:48:51Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - pseudo-Bayesian Neural Networks for detecting Out of Distribution Inputs [12.429095025814345]
We propose pseudo-BNNs where instead of learning distributions over weights, we use point estimates and perturb weights at the time of inference.
Overall, this combination results in a principled technique to detect OOD samples at the time of inference.
arXiv Detail & Related papers (2021-02-02T06:23:04Z) - Probing Predictions on OOD Images via Nearest Categories [97.055916832257]
We study out-of-distribution (OOD) prediction behavior of neural networks when they classify images from unseen classes or corrupted images.
We introduce a new measure, nearest category generalization (NCG), where we compute the fraction of OOD inputs that are classified with the same label as their nearest neighbor in the training set.
We find that robust networks have consistently higher NCG accuracy than natural training, even when the OOD data is much farther away than the robustness radius.
arXiv Detail & Related papers (2020-11-17T07:42:27Z) - Adversarial Robustness Guarantees for Random Deep Neural Networks [15.68430580530443]
adversarial examples are incorrectly classified inputs that are extremely close to a correctly classified input.
We prove that for any $pge1$, the $ellp$ distance of any given input from the classification boundary scales as one over the square root of the dimension of the input times the $ellp$ norm of the input.
The results constitute a fundamental advance in the theoretical understanding of adversarial examples, and open the way to a thorough theoretical characterization of the relation between network architecture and robustness to adversarial perturbations.
arXiv Detail & Related papers (2020-04-13T13:07:26Z) - Exploiting Verified Neural Networks via Floating Point Numerical Error [15.639601066641099]
Verifiers aspire to answer whether a neural network guarantees certain properties with respect to all inputs in a space.
We show that the negligence of floating point error leads to unsound verification that can be systematically exploited in practice.
arXiv Detail & Related papers (2020-03-06T03:58:26Z) - Utilizing Network Properties to Detect Erroneous Inputs [0.76146285961466]
We train a linear SVM classifier to detect erroneous data using hidden and softmax feature vectors of pre-trained neural networks.
Our results indicate that these faulty data types generally exhibit linearly separable activation properties from correct examples.
We experimentally validate our findings across a diverse range of datasets, domains, pre-trained models, and adversarial attacks.
arXiv Detail & Related papers (2020-02-28T03:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.