Related papers: Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be Secretly Coded into the Entropy of Classifiers' Outputs

Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be Secretly Coded into the Entropy of Classifiers' Outputs

URL: http://arxiv.org/abs/2105.12049v1
Date: Tue, 25 May 2021 16:27:57 GMT
Title: Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be Secretly Coded into the Entropy of Classifiers' Outputs
Authors: Mohammad Malekzadeh and Anastasia Borovykh and Deniz G\"und\"uz
Abstract summary: Deep neural networks, trained for the classification of a non-sensitive target attribute, can reveal sensitive attributes of their input data. We show that deep classifiers can be trained to secretly encode a sensitive attribute of users' input data, at inference time.
Score: 1.0742675209112622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is known that deep neural networks, trained for the classification of a non-sensitive target attribute, can reveal sensitive attributes of their input data; through features of different granularity extracted by the classifier. We, taking a step forward, show that deep classifiers can be trained to secretly encode a sensitive attribute of users' input data, at inference time, into the classifier's outputs for the target attribute. An attack that works even if users have a white-box view of the classifier, and can keep all internal representations hidden except for the classifier's estimation of the target attribute. We introduce an information-theoretical formulation of such adversaries and present efficient empirical implementations for training honest-but-curious (HBC) classifiers based on this formulation: deep models that can be accurate in predicting the target attribute, but also can utilize their outputs to secretly encode a sensitive attribute. Our evaluations on several tasks in real-world datasets show that a semi-trusted server can build a classifier that is not only perfectly honest but also accurately curious. Our work highlights a vulnerability that can be exploited by malicious machine learning service providers to attack their user's privacy in several seemingly safe scenarios; such as encrypted inferences, computations at the edge, or private knowledge distillation. We conclude by showing the difficulties in distinguishing between standard and HBC classifiers and discussing potential proactive defenses against this vulnerability of deep classifiers.

Related papers

Compositional Caching for Training-free Open-vocabulary Attribute Detection [65.46250297408974]
We present Compositional Caching (ComCa), a training-free method for open-vocabulary attribute detection. ComCa requires only the list of target attributes and objects as input, using them to populate an auxiliary cache of images. Experiments on public datasets demonstrate that ComCa significantly outperforms zero-shot and cache-based baselines.
arXiv Detail & Related papers (2025-03-24T21:00:37Z)
Probing Network Decisions: Capturing Uncertainties and Unveiling Vulnerabilities Without Label Information [19.50321703079894]
We present a novel framework to uncover the weakness of the classifier via counterfactual examples. We test the performance of our prober's misclassification detection and verify its effectiveness on the image classification benchmark datasets.
arXiv Detail & Related papers (2025-03-12T05:05:58Z)
From Generalist to Specialist: Exploring CWE-Specific Vulnerability Detection [1.9249287163937974]
Common Weaknession (CWE) represents a unique category of vulnerabilities with distinct characteristics, code semantics, and patterns. Treating all vulnerabilities as a single label with a binary classification approach may oversimplify the problem.
arXiv Detail & Related papers (2024-08-05T09:12:39Z)
How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z)
Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations [15.957198667607006]
We introduce the first Class Attribute Inference Attack (CAIA) to infer sensitive attributes of individual classes in a black-box setting. Our experiments in the face recognition domain show that CAIA can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender, and racial appearance.
arXiv Detail & Related papers (2023-03-16T13:10:58Z)
Training privacy-preserving video analytics pipelines by suppressing features that reveal information about private attributes [40.31692020706419]
We consider an adversary with access to the features extracted by a deployed deep neural network and use these features to predict private attributes. We modify the training of the network using a confusion loss that encourages the extraction of features that make it difficult for the adversary to accurately predict private attributes. Results show that, compared to the original network, the proposed PrivateNet can reduce the leakage of private information of a state-of-the-art emotion recognition by 2.88% for gender and by 13.06% for age group.
arXiv Detail & Related papers (2022-03-05T01:31:07Z)
PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition [55.858374644761525]
Face recognition networks encode information about sensitive attributes while being trained for identity classification. Existing bias mitigation approaches require end-to-end training and are unable to achieve high verification accuracy. We present a descriptors-based adversarial de-biasing approach called Protected Attribute Suppression System ( PASS)' Pass can be trained on top of descriptors obtained from any previously trained high-performing network to classify identities and simultaneously reduce encoding of sensitive attributes.
arXiv Detail & Related papers (2021-08-09T00:39:22Z)
Detection of Adversarial Supports in Few-shot Classifiers Using Feature Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets. We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection. Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)
Open-set Adversarial Defense [93.25058425356694]
We show that open-set recognition systems are vulnerable to adversarial attacks. Motivated by this observation, we emphasize the need of an Open-Set Adrial Defense (OSAD) mechanism. This paper proposes an Open-Set Defense Network (OSDN) as a solution to the OSAD problem.
arXiv Detail & Related papers (2020-09-02T04:35:33Z)
Counterfactual Explanation Based on Gradual Construction for Deep Networks [17.79934085808291]
The patterns that deep networks have learned from a training dataset can be grasped by observing the feature variation among various classes. Current approaches perform the feature modification to increase the classification probability for the target class irrespective of the internal characteristics of deep networks. We propose a counterfactual explanation method that exploits the statistics learned from a training dataset.
arXiv Detail & Related papers (2020-08-05T01:18:31Z)
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations. We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.