Related papers: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

URL: http://arxiv.org/abs/2203.14195v1
Date: Sun, 27 Mar 2022 03:23:32 GMT
Title: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective
Authors: Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jinfeng Yi, Mingyi Hong, Shiyu Chang, Sijia Liu
Abstract summary: We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback? We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS) We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
Score: 74.47093382436823
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The lack of adversarial robustness has been recognized as an important issue for state-of-the-art machine learning (ML) models, e.g., deep neural networks (DNNs). Thereby, robustifying ML models against adversarial attacks is now a major focus of research. However, nearly all existing defense methods, particularly for robust training, made the white-box assumption that the defender has the access to the details of an ML model (or its surrogate alternatives if available), e.g., its architectures and parameters. Beyond existing works, in this paper we aim to address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback? Such a problem arises in practical scenarios, where the owner of the predictive model is reluctant to share model information in order to preserve privacy. To this end, we propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS), a first-order (FO) certified defense technique. To allow the design of merely using model queries, we further integrate DS with the zeroth-order (gradient-free) optimization. However, a direct implementation of zeroth-order (ZO) optimization suffers a high variance of gradient estimates, and thus leads to ineffective defense. To tackle this problem, we next propose to prepend an autoencoder (AE) to a given (black-box) model so that DS can be trained using variance-reduced ZO optimization. We term the eventual defense as ZO-AE-DS. In practice, we empirically show that ZO-AE- DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines. And the effectiveness of our approach is justified under both image classification and image reconstruction tasks. Codes are available at https://github.com/damon-demon/Black-Box-Defense.

Related papers

Confidence Elicitation: A New Attack Vector for Large Language Models [32.22764815262567]
Large language models (LLMs) with billions of parameters suffer from adversarial attacks just like their earlier, smaller counterparts. With the introduction of closed-source models, no information about the model is available apart from the generated output. This means that current black-box attacks can only utilize the final prediction to detect if an attack is successful. In this work, we investigate and demonstrate the potential of attack guidance, akin to using output probabilities, while having only black-box access in a classification setting.
arXiv Detail & Related papers (2025-02-07T04:07:36Z)
DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model [50.94236887900527]
We present a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset. We learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data.
arXiv Detail & Related papers (2024-12-08T07:37:05Z)
Black-Box Forgetting [8.84485103053191]
We address a novel problem of selective forgetting for black-box models, named Black-Box Forgetting. We propose Latent Context Sharing, which introduces common low-dimensional latent components among multiple tokens for the prompt. Experiments on four standard benchmark datasets demonstrate the superiority of our method with reasonable baselines.
arXiv Detail & Related papers (2024-11-01T07:10:40Z)
Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior [36.101904669291436]
This paper studies the challenging black-box adversarial attack that aims to generate examples against a black-box model by only using output feedback of the model to input queries. We propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior.
arXiv Detail & Related papers (2024-05-29T14:05:16Z)
Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems. GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model. We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z)
T-SEA: Transfer-based Self-Ensemble Attack on Object Detection [9.794192858806905]
We propose a single-model transfer-based black-box attack on object detection, utilizing only one model to achieve a high-transferability adversarial attack on multiple black-box detectors. We analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch.
arXiv Detail & Related papers (2022-11-16T10:27:06Z)
Attackar: Attack of the Evolutionary Adversary [0.0]
This paper introduces textitAttackar, an evolutionary, score-based, black-box attack. Attackar is based on a novel objective function that can be used in gradient-free optimization problems. Our results demonstrate the superior performance of Attackar, both in terms of accuracy score and query efficiency.
arXiv Detail & Related papers (2022-08-17T13:57:23Z)
Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z)
Learning Black-Box Attackers with Transferable Priors and Query Feedback [40.41083684665537]
This paper addresses the challenging black-box adversarial attack problem, where only classification confidence of a victim model is available. Inspired by consistency of visual saliency between different vision models, a surrogate model is expected to improve the attack performance via transferability. We propose a surprisingly simple baseline approach (named SimBA++) using the surrogate model, which significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-10-21T05:43:11Z)
Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack) NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.