Getting a-Round Guarantees: Floating-Point Attacks on Certified Robustness
- URL: http://arxiv.org/abs/2205.10159v5
- Date: Mon, 9 Sep 2024 06:47:34 GMT
- Title: Getting a-Round Guarantees: Floating-Point Attacks on Certified Robustness
- Authors: Jiankai Jin, Olga Ohrimenko, Benjamin I. P. Rubinstein,
- Abstract summary: Adversarial examples pose a security risk as they can alter decisions of a machine learning classifier through slight input perturbations.
We show that these guarantees can be invalidated due to limitations of floating-point representation that cause rounding errors.
We show that the attack can be carried out against linear classifiers that have exact certifiable guarantees and against neural networks that have conservative certifications.
- Score: 19.380453459873298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples pose a security risk as they can alter decisions of a machine learning classifier through slight input perturbations. Certified robustness has been proposed as a mitigation where given an input $\mathbf{x}$, a classifier returns a prediction and a certified radius $R$ with a provable guarantee that any perturbation to $\mathbf{x}$ with $R$-bounded norm will not alter the classifier's prediction. In this work, we show that these guarantees can be invalidated due to limitations of floating-point representation that cause rounding errors. We design a rounding search method that can efficiently exploit this vulnerability to find adversarial examples against state-of-the-art certifications in two threat models, that differ in how the norm of the perturbation is computed. We show that the attack can be carried out against linear classifiers that have exact certifiable guarantees and against neural networks that have conservative certifications. In the weak threat model, our experiments demonstrate attack success rates over 50% on random linear classifiers, up to 23% on the MNIST dataset for linear SVM, and up to 15% for a neural network. In the strong threat model, the success rates are lower but positive. The floating-point errors exploited by our attacks can range from small to large (e.g., $10^{-13}$ to $10^{3}$) - showing that even negligible errors can be systematically exploited to invalidate guarantees provided by certified robustness. Finally, we propose a formal mitigation approach based on rounded interval arithmetic, encouraging future implementations of robustness certificates to account for limitations of modern computing architecture to provide sound certifiable guarantees.
Related papers
- Verifiably Robust Conformal Prediction [1.391198481393699]
This paper introduces VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages neural network verification methods to recover coverage guarantees under adversarial attacks.
Our method is the first to support perturbations bounded by arbitrary norms including $ell1$, $ell2$, and $ellinfty$, as well as regression tasks.
In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA.
arXiv Detail & Related papers (2024-05-29T09:50:43Z) - The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks.
We show how to design an efficient classifier with a certified radius by relying on noise injection into the inputs.
Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
arXiv Detail & Related papers (2023-09-28T22:41:47Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Almost Tight L0-norm Certified Robustness of Top-k Predictions against
Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches.
Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input.
For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z) - Certifying Confidence via Randomized Smoothing [151.67113334248464]
Randomized smoothing has been shown to provide good certified-robustness guarantees for high-dimensional classification problems.
Most smoothing methods do not give us any information about the confidence with which the underlying classifier makes a prediction.
We propose a method to generate certified radii for the prediction confidence of the smoothed classifier.
arXiv Detail & Related papers (2020-09-17T04:37:26Z) - Detection as Regression: Certified Object Detection by Median Smoothing [50.89591634725045]
This work is motivated by recent progress on certified classification by randomized smoothing.
We obtain the first model-agnostic, training-free, and certified defense for object detection against $ell$-bounded attacks.
arXiv Detail & Related papers (2020-07-07T18:40:19Z) - Extensions and limitations of randomized smoothing for robustness
guarantees [13.37805637358556]
We study how the choice of divergence between smoothing measures affects the final robustness guarantee.
We develop a method to certify robustness against any $ell_p$ ($pinmathbbN_>0$) minimized adversarial perturbation.
arXiv Detail & Related papers (2020-06-07T17:22:32Z) - Regularized Training and Tight Certification for Randomized Smoothed
Classifier with Provable Robustness [15.38718018477333]
We derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart.
We also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability.
arXiv Detail & Related papers (2020-02-17T20:54:34Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.