Related papers: Privacy-preserving Universal Adversarial Defense for Black-box Models

Privacy-preserving Universal Adversarial Defense for Black-box Models

URL: http://arxiv.org/abs/2408.10647v1
Date: Tue, 20 Aug 2024 08:40:39 GMT
Title: Privacy-preserving Universal Adversarial Defense for Black-box Models
Authors: Qiao Li, Cong Wu, Jing Chen, Zijun Zhang, Kun He, Ruiying Du, Xinxin Wang, Qingchuang Zhao, Yang Liu,
Abstract summary: We introduce DUCD, a universal black-box defense method that does not require access to the target model's parameters or architecture. Our approach involves querying the target model by querying it with data, creating a white-box surrogate while preserving data privacy. Experiments on multiple image classification datasets show that DUCD not only outperforms existing black-box defenses but also matches the accuracy of white-box defenses.
Score: 20.968518031455503
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks (DNNs) are increasingly used in critical applications such as identity authentication and autonomous driving, where robustness against adversarial attacks is crucial. These attacks can exploit minor perturbations to cause significant prediction errors, making it essential to enhance the resilience of DNNs. Traditional defense methods often rely on access to detailed model information, which raises privacy concerns, as model owners may be reluctant to share such data. In contrast, existing black-box defense methods fail to offer a universal defense against various types of adversarial attacks. To address these challenges, we introduce DUCD, a universal black-box defense method that does not require access to the target model's parameters or architecture. Our approach involves distilling the target model by querying it with data, creating a white-box surrogate while preserving data privacy. We further enhance this surrogate model using a certified defense based on randomized smoothing and optimized noise selection, enabling robust defense against a broad range of adversarial attacks. Comparative evaluations between the certified defenses of the surrogate and target models demonstrate the effectiveness of our approach. Experiments on multiple image classification datasets show that DUCD not only outperforms existing black-box defenses but also matches the accuracy of white-box defenses, all while enhancing data privacy and reducing the success rate of membership inference attacks.

Related papers

MARS: A Malignity-Aware Backdoor Defense in Federated Learning [51.77354308287098]
Recently proposed state-of-the-art (SOTA) attack, 3DFed, uses an indicator mechanism to determine whether backdoor models have been accepted by the defender.<n>We propose a Malignity-Aware backdooR defenSe (MARS) that leverages backdoor energy to indicate the malicious extent of each neuron.<n>Experiments demonstrate that MARS can defend against SOTA backdoor attacks and significantly outperforms existing defenses.
arXiv Detail & Related papers (2025-09-21T14:50:02Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Reformulation is All You Need: Addressing Malicious Text Features in DNNs [53.45564571192014]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks.<n>Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify. We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time. Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence [34.35162562625252]
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models. We study a new paradigm of black-box attacks with provable guarantees. This new black-box attack unveils significant vulnerabilities of machine learning models.
arXiv Detail & Related papers (2023-04-10T01:12:09Z)
How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective [74.47093382436823]
We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback? We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS) We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
arXiv Detail & Related papers (2022-03-27T03:23:32Z)
One Parameter Defense -- Defending against Data Inference Attacks via Differential Privacy [26.000487178636927]
Machine learning models are vulnerable to data inference attacks, such as membership inference and model inversion attacks. Most existing defense methods only protect against membership inference attacks. We propose a differentially private defense method that handles both types of attacks in a time-efficient manner.
arXiv Detail & Related papers (2022-03-13T06:06:24Z)
LTU Attacker for Membership Inference [23.266710407178078]
We address the problem of defending predictive models against membership inference attacks. Both utility and privacy are evaluated with an external apparatus including an Attacker and an Evaluator. We prove that, under certain conditions, even a "na"ive" LTU Attacker can achieve lower bounds on privacy loss with simple attack strategies.
arXiv Detail & Related papers (2022-02-04T18:06:21Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks. Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.