Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be
Secretly Coded into the Entropy of Classifiers' Outputs
- URL: http://arxiv.org/abs/2105.12049v1
- Date: Tue, 25 May 2021 16:27:57 GMT
- Title: Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be
Secretly Coded into the Entropy of Classifiers' Outputs
- Authors: Mohammad Malekzadeh and Anastasia Borovykh and Deniz G\"und\"uz
- Abstract summary: Deep neural networks, trained for the classification of a non-sensitive target attribute, can reveal sensitive attributes of their input data.
We show that deep classifiers can be trained to secretly encode a sensitive attribute of users' input data, at inference time.
- Score: 1.0742675209112622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is known that deep neural networks, trained for the classification of a
non-sensitive target attribute, can reveal sensitive attributes of their input
data; through features of different granularity extracted by the classifier.
We, taking a step forward, show that deep classifiers can be trained to
secretly encode a sensitive attribute of users' input data, at inference time,
into the classifier's outputs for the target attribute. An attack that works
even if users have a white-box view of the classifier, and can keep all
internal representations hidden except for the classifier's estimation of the
target attribute. We introduce an information-theoretical formulation of such
adversaries and present efficient empirical implementations for training
honest-but-curious (HBC) classifiers based on this formulation: deep models
that can be accurate in predicting the target attribute, but also can utilize
their outputs to secretly encode a sensitive attribute. Our evaluations on
several tasks in real-world datasets show that a semi-trusted server can build
a classifier that is not only perfectly honest but also accurately curious. Our
work highlights a vulnerability that can be exploited by malicious machine
learning service providers to attack their user's privacy in several seemingly
safe scenarios; such as encrypted inferences, computations at the edge, or
private knowledge distillation. We conclude by showing the difficulties in
distinguishing between standard and HBC classifiers and discussing potential
proactive defenses against this vulnerability of deep classifiers.
Related papers
- From Generalist to Specialist: Exploring CWE-Specific Vulnerability Detection [1.9249287163937974]
Common Weaknession (CWE) represents a unique category of vulnerabilities with distinct characteristics, code semantics, and patterns.
Treating all vulnerabilities as a single label with a binary classification approach may oversimplify the problem.
arXiv Detail & Related papers (2024-08-05T09:12:39Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Class Attribute Inference Attacks: Inferring Sensitive Class Information
by Diffusion-Based Attribute Manipulations [15.957198667607006]
We introduce the first Class Attribute Inference Attack (CAIA) to infer sensitive attributes of individual classes in a black-box setting.
Our experiments in the face recognition domain show that CAIA can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender, and racial appearance.
arXiv Detail & Related papers (2023-03-16T13:10:58Z) - Training privacy-preserving video analytics pipelines by suppressing
features that reveal information about private attributes [40.31692020706419]
We consider an adversary with access to the features extracted by a deployed deep neural network and use these features to predict private attributes.
We modify the training of the network using a confusion loss that encourages the extraction of features that make it difficult for the adversary to accurately predict private attributes.
Results show that, compared to the original network, the proposed PrivateNet can reduce the leakage of private information of a state-of-the-art emotion recognition by 2.88% for gender and by 13.06% for age group.
arXiv Detail & Related papers (2022-03-05T01:31:07Z) - PASS: Protected Attribute Suppression System for Mitigating Bias in Face
Recognition [55.858374644761525]
Face recognition networks encode information about sensitive attributes while being trained for identity classification.
Existing bias mitigation approaches require end-to-end training and are unable to achieve high verification accuracy.
We present a descriptors-based adversarial de-biasing approach called Protected Attribute Suppression System ( PASS)'
Pass can be trained on top of descriptors obtained from any previously trained high-performing network to classify identities and simultaneously reduce encoding of sensitive attributes.
arXiv Detail & Related papers (2021-08-09T00:39:22Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Open-set Adversarial Defense [93.25058425356694]
We show that open-set recognition systems are vulnerable to adversarial attacks.
Motivated by this observation, we emphasize the need of an Open-Set Adrial Defense (OSAD) mechanism.
This paper proposes an Open-Set Defense Network (OSDN) as a solution to the OSAD problem.
arXiv Detail & Related papers (2020-09-02T04:35:33Z) - Counterfactual Explanation Based on Gradual Construction for Deep
Networks [17.79934085808291]
The patterns that deep networks have learned from a training dataset can be grasped by observing the feature variation among various classes.
Current approaches perform the feature modification to increase the classification probability for the target class irrespective of the internal characteristics of deep networks.
We propose a counterfactual explanation method that exploits the statistics learned from a training dataset.
arXiv Detail & Related papers (2020-08-05T01:18:31Z) - Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations.
We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.