PROSAC: Provably Safe Certification for Machine Learning Models under
Adversarial Attacks
- URL: http://arxiv.org/abs/2402.02629v1
- Date: Sun, 4 Feb 2024 22:45:20 GMT
- Title: PROSAC: Provably Safe Certification for Machine Learning Models under
Adversarial Attacks
- Authors: Ziquan Liu, Zhuo Zhi, Ilija Bogunovic, Carsten Gerner-Beuerle, Miguel
Rodrigues
- Abstract summary: State-of-the-art machine learning models can be seriously compromised by adversarial perturbations.
We propose a new approach to certify the performance of machine learning models in the presence of adversarial attacks.
- Score: 20.73708921078335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is widely known that state-of-the-art machine learning models, including
vision and language models, can be seriously compromised by adversarial
perturbations. It is therefore increasingly relevant to develop capabilities to
certify their performance in the presence of the most effective adversarial
attacks. Our paper offers a new approach to certify the performance of machine
learning models in the presence of adversarial attacks with population level
risk guarantees. In particular, we introduce the notion of $(\alpha,\zeta)$
machine learning model safety. We propose a hypothesis testing procedure, based
on the availability of a calibration set, to derive statistical guarantees
providing that the probability of declaring that the adversarial (population)
risk of a machine learning model is less than $\alpha$ (i.e. the model is
safe), while the model is in fact unsafe (i.e. the model adversarial population
risk is higher than $\alpha$), is less than $\zeta$. We also propose Bayesian
optimization algorithms to determine efficiently whether a machine learning
model is $(\alpha,\zeta)$-safe in the presence of an adversarial attack, along
with statistical guarantees. We apply our framework to a range of machine
learning models including various sizes of vision Transformer (ViT) and ResNet
models impaired by a variety of adversarial attacks, such as AutoAttack,
SquareAttack and natural evolution strategy attack, to illustrate the operation
of our approach. Importantly, we show that ViT's are generally more robust to
adversarial attacks than ResNets, and ViT-large is more robust than smaller
models. Our approach goes beyond existing empirical adversarial risk-based
certification guarantees. It formulates rigorous (and provable) performance
guarantees that can be used to satisfy regulatory requirements mandating the
use of state-of-the-art technical tools.
Related papers
- A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models [9.304845676825584]
We propose a novel adversarial training framework that integrates multiple attack strategies and advanced machine learning techniques.
Experiments conducted on real-world datasets, including CIFAR-10 and CIFAR-100, demonstrate that the proposed method significantly enhances model robustness.
arXiv Detail & Related papers (2024-10-18T23:47:46Z) - Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models.
Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z) - Improved Membership Inference Attacks Against Language Classification Models [0.0]
We present a novel framework for running membership inference attacks against classification models.
We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label.
arXiv Detail & Related papers (2023-10-11T06:09:48Z) - AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Interpretable Models [1.8752655643513647]
XAI tools can increase the vulnerability of model extraction attacks, which is a concern when model owners prefer black-box access.
We propose a novel retraining (learning) based model extraction attack framework against interpretable models under black-box settings.
We show that AUTOLYCUS is highly effective, requiring significantly fewer queries compared to state-of-the-art attacks.
arXiv Detail & Related papers (2023-02-04T13:23:39Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - Careful What You Wish For: on the Extraction of Adversarially Trained
Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - RAB: Provable Robustness Against Backdoor Attacks [20.702977915926787]
We focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks.
We propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks.
We conduct comprehensive experiments for different machine learning (ML) models and provide the first benchmark for certified robustness against backdoor attacks.
arXiv Detail & Related papers (2020-03-19T17:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.