CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks
- URL: http://arxiv.org/abs/2109.10696v1
- Date: Wed, 22 Sep 2021 12:46:04 GMT
- Title: CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks
- Authors: Mikhail Pautov, Nurislam Tursynbek, Marina Munkhoeva, Nikita Muravev,
Aleksandr Petiushko, Ivan Oseledets
- Abstract summary: In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
- Score: 58.29502185344086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In safety-critical machine learning applications, it is crucial to defend
models against adversarial attacks -- small modifications of the input that
change the predictions. Besides rigorously studied $\ell_p$-bounded additive
perturbations, recently proposed semantic perturbations (e.g. rotation,
translation) raise a serious concern on deploying ML systems in real-world.
Therefore, it is important to provide provable guarantees for deep learning
models against semantically meaningful input transformations. In this paper, we
propose a new universal probabilistic certification approach based on
Chernoff-Cramer bounds that can be used in general attack settings. We estimate
the probability of a model to fail if the attack is sampled from a certain
distribution. Our theoretical findings are supported by experimental results on
different datasets.
Related papers
- Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change [4.239829789304117]
Counterfactual explanations (CFEs) guide users on how to adjust inputs to machine learning models to achieve desired outputs.
Current methods addressing this issue often support only specific models or change types.
This paper proposes a novel approach for generating CFEs that provides probabilistic guarantees for any model and change type.
arXiv Detail & Related papers (2024-08-09T03:35:53Z) - FACADE: A Framework for Adversarial Circuit Anomaly Detection and
Evaluation [9.025997629442896]
FACADE is designed for unsupervised mechanistic anomaly detection in deep neural networks.
Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
arXiv Detail & Related papers (2023-07-20T04:00:37Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Ramifications of Approximate Posterior Inference for Bayesian Deep
Learning in Adversarial and Out-of-Distribution Settings [7.476901945542385]
We show that Bayesian deep learning models on certain occasions marginally outperform conventional neural networks.
Preliminary investigations indicate the potential inherent role of bias due to choices of initialisation, architecture or activation functions.
arXiv Detail & Related papers (2020-09-03T16:58:15Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.