Differentially Private Counterfactuals via Functional Mechanism
- URL: http://arxiv.org/abs/2208.02878v1
- Date: Thu, 4 Aug 2022 20:31:22 GMT
- Title: Differentially Private Counterfactuals via Functional Mechanism
- Authors: Fan Yang, Qizhang Feng, Kaixiong Zhou, Jiahao Chen, Xia Hu
- Abstract summary: We propose a novel framework to generate differentially private counterfactual (DPC) without touching the deployed model or explanation set.
In particular, we train an autoencoder with the functional mechanism to construct noisy class prototypes, and then derive the DPC from the latent prototypes.
- Score: 47.606474009932825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual, serving as one emerging type of model explanation, has
attracted tons of attentions recently from both industry and academia.
Different from the conventional feature-based explanations (e.g.,
attributions), counterfactuals are a series of hypothetical samples which can
flip model decisions with minimal perturbations on queries. Given valid
counterfactuals, humans are capable of reasoning under ``what-if''
circumstances, so as to better understand the model decision boundaries.
However, releasing counterfactuals could be detrimental, since it may
unintentionally leak sensitive information to adversaries, which brings about
higher risks on both model security and data privacy. To bridge the gap, in
this paper, we propose a novel framework to generate differentially private
counterfactual (DPC) without touching the deployed model or explanation set,
where noises are injected for protection while maintaining the explanation
roles of counterfactual. In particular, we train an autoencoder with the
functional mechanism to construct noisy class prototypes, and then derive the
DPC from the latent prototypes based on the post-processing immunity of
differential privacy. Further evaluations demonstrate the effectiveness of the
proposed framework, showing that DPC can successfully relieve the risks on both
extraction and inference attacks.
Related papers
- Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Bridging Differential Privacy and Byzantine-Robustness via Model
Aggregation [27.518542543750367]
This paper aims at addressing conflicting issues in federated learning: differential privacy and Byzantinerobustness.
Standard mechanisms add transmitted DP, envelops entangles with robust gradient aggregation to defend against Byzantine attacks.
We show that the influence of our proposed mechanisms is deperturbed with that robust model aggregation.
arXiv Detail & Related papers (2022-04-29T23:37:46Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Harnessing Perceptual Adversarial Patches for Crowd Counting [92.79051296850405]
Crowd counting is vulnerable to adversarial examples in the physical world.
This paper proposes the Perceptual Adrial Patch (PAP) generation framework to learn the shared perceptual features between models.
arXiv Detail & Related papers (2021-09-16T13:51:39Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Federated Model Distillation with Noise-Free Differential Privacy [35.72801867380072]
We propose a novel framework called FEDMD-NFDP, which applies a Noise-Free Differential Privacy (NFDP) mechanism into a federated model distillation framework.
Our extensive experimental results on various datasets validate that FEDMD-NFDP can deliver comparable utility and communication efficiency.
arXiv Detail & Related papers (2020-09-11T17:19:56Z) - Improving Robustness to Model Inversion Attacks via Mutual Information
Regularization [12.079281416410227]
This paper studies defense mechanisms against model inversion (MI) attacks.
MI is a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
We propose the Mutual Information Regularization based Defense (MID) against MI attacks.
arXiv Detail & Related papers (2020-09-11T06:02:44Z) - Mitigating Query-Flooding Parameter Duplication Attack on Regression
Models with High-Dimensional Gaussian Mechanism [12.017509695576377]
Differential privacy (DP) has been considered a promising technique to mitigate this attack.
We show that the adversary can launch a query-flooding parameter duplication (QPD) attack to infer the model information.
We propose a novel High-Dimensional Gaussian (HDG) mechanism to prevent unauthorized information disclosure.
arXiv Detail & Related papers (2020-02-06T01:47:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.