How to Robustify Black-Box ML Models? A Zeroth-Order Optimization
Perspective
- URL: http://arxiv.org/abs/2203.14195v1
- Date: Sun, 27 Mar 2022 03:23:32 GMT
- Title: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization
Perspective
- Authors: Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jinfeng Yi, Mingyi Hong, Shiyu
Chang, Sijia Liu
- Abstract summary: We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback?
We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS)
We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
- Score: 74.47093382436823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lack of adversarial robustness has been recognized as an important issue
for state-of-the-art machine learning (ML) models, e.g., deep neural networks
(DNNs). Thereby, robustifying ML models against adversarial attacks is now a
major focus of research. However, nearly all existing defense methods,
particularly for robust training, made the white-box assumption that the
defender has the access to the details of an ML model (or its surrogate
alternatives if available), e.g., its architectures and parameters. Beyond
existing works, in this paper we aim to address the problem of black-box
defense: How to robustify a black-box model using just input queries and output
feedback? Such a problem arises in practical scenarios, where the owner of the
predictive model is reluctant to share model information in order to preserve
privacy. To this end, we propose a general notion of defensive operation that
can be applied to black-box models, and design it through the lens of denoised
smoothing (DS), a first-order (FO) certified defense technique. To allow the
design of merely using model queries, we further integrate DS with the
zeroth-order (gradient-free) optimization. However, a direct implementation of
zeroth-order (ZO) optimization suffers a high variance of gradient estimates,
and thus leads to ineffective defense. To tackle this problem, we next propose
to prepend an autoencoder (AE) to a given (black-box) model so that DS can be
trained using variance-reduced ZO optimization. We term the eventual defense as
ZO-AE-DS. In practice, we empirically show that ZO-AE- DS can achieve improved
accuracy, certified robustness, and query complexity over existing baselines.
And the effectiveness of our approach is justified under both image
classification and image reconstruction tasks. Codes are available at
https://github.com/damon-demon/Black-Box-Defense.
Related papers
- Black-Box Forgetting [8.84485103053191]
We address a novel problem of selective forgetting for black-box models, named Black-Box Forgetting.
We propose Latent Context Sharing, which introduces common low-dimensional latent components among multiple tokens for the prompt.
Experiments on four standard benchmark datasets demonstrate the superiority of our method with reasonable baselines.
arXiv Detail & Related papers (2024-11-01T07:10:40Z) - Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior [36.101904669291436]
This paper studies the challenging black-box adversarial attack that aims to generate examples against a black-box model by only using output feedback of the model to input queries.
We propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks.
Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior.
arXiv Detail & Related papers (2024-05-29T14:05:16Z) - Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model.
We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z) - T-SEA: Transfer-based Self-Ensemble Attack on Object Detection [9.794192858806905]
We propose a single-model transfer-based black-box attack on object detection, utilizing only one model to achieve a high-transferability adversarial attack on multiple black-box detectors.
We analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch.
arXiv Detail & Related papers (2022-11-16T10:27:06Z) - Attackar: Attack of the Evolutionary Adversary [0.0]
This paper introduces textitAttackar, an evolutionary, score-based, black-box attack.
Attackar is based on a novel objective function that can be used in gradient-free optimization problems.
Our results demonstrate the superior performance of Attackar, both in terms of accuracy score and query efficiency.
arXiv Detail & Related papers (2022-08-17T13:57:23Z) - Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains.
As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input.
Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z) - Learning Black-Box Attackers with Transferable Priors and Query Feedback [40.41083684665537]
This paper addresses the challenging black-box adversarial attack problem, where only classification confidence of a victim model is available.
Inspired by consistency of visual saliency between different vision models, a surrogate model is expected to improve the attack performance via transferability.
We propose a surprisingly simple baseline approach (named SimBA++) using the surrogate model, which significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-10-21T05:43:11Z) - Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack)
NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.