Unlearning Protected User Attributes in Recommendations with Adversarial
Training
- URL: http://arxiv.org/abs/2206.04500v1
- Date: Thu, 9 Jun 2022 13:36:28 GMT
- Title: Unlearning Protected User Attributes in Recommendations with Adversarial
Training
- Authors: Christian Ganh\"or, David Penz, Navid Rekabsaz, Oleg Lesota, Markus
Schedl
- Abstract summary: Collaborative filtering algorithms capture underlying consumption patterns, including the ones specific to particular demographics or protected information of users.
These encoded biases can influence the decision of a recommendation system towards further separation of the contents provided to various demographic subgroups.
In this work, we investigate the possibility and challenges of removing specific protected information of users from the learned interaction representations of a RS algorithm.
- Score: 10.268369743620159
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborative filtering algorithms capture underlying consumption patterns,
including the ones specific to particular demographics or protected information
of users, e.g. gender, race, and location. These encoded biases can influence
the decision of a recommendation system (RS) towards further separation of the
contents provided to various demographic subgroups, and raise privacy concerns
regarding the disclosure of users' protected attributes. In this work, we
investigate the possibility and challenges of removing specific protected
information of users from the learned interaction representations of a RS
algorithm, while maintaining its effectiveness. Specifically, we incorporate
adversarial training into the state-of-the-art MultVAE architecture, resulting
in a novel model, Adversarial Variational Auto-Encoder with Multinomial
Likelihood (Adv-MultVAE), which aims at removing the implicit information of
protected attributes while preserving recommendation performance. We conduct
experiments on the MovieLens-1M and LFM-2b-DemoBias datasets, and evaluate the
effectiveness of the bias mitigation method based on the inability of external
attackers in revealing the users' gender information from the model. Comparing
with baseline MultVAE, the results show that Adv-MultVAE, with marginal
deterioration in performance (w.r.t. NDCG and recall), largely mitigates
inherent biases in the model on both datasets.
Related papers
- Simultaneous Unlearning of Multiple Protected User Attributes From Variational Autoencoder Recommenders Using Adversarial Training [8.272412404173954]
We present AdvXMultVAE which aims to unlearn multiple protected attributes simultaneously to improve fairness across demographic user groups.
Our experiments on two datasets, LFM-2b-100k and Ml-1m, show that our approach can yield better results than its singular removal counterparts.
arXiv Detail & Related papers (2024-10-28T12:36:00Z) - Analyzing Inference Privacy Risks Through Gradients in Machine Learning [17.2657358645072]
We present a unified game-based framework that encompasses a broad range of attacks including attribute, property, distributional, and user disclosures.
Our results demonstrate the inefficacy of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning.
arXiv Detail & Related papers (2024-08-29T21:21:53Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - Making Users Indistinguishable: Attribute-wise Unlearning in Recommender
Systems [28.566330708233824]
We find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training.
To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable.
arXiv Detail & Related papers (2023-10-06T09:36:44Z) - Ex-Ante Assessment of Discrimination in Dataset [20.574371560492494]
Data owners face increasing liability for how the use of their data could harm under-priviliged communities.
We propose FORESEE, a FORESt of decision trEEs algorithm, which generates a score that captures how likely an individual's response varies with sensitive attributes.
arXiv Detail & Related papers (2022-08-16T19:28:22Z) - Debiasing Learning for Membership Inference Attacks Against Recommender
Systems [79.48353547307887]
Learned recommender systems may inadvertently leak information about their training data, leading to privacy violations.
We investigate privacy threats faced by recommender systems through the lens of membership inference.
We propose a Debiasing Learning for Membership Inference Attacks against recommender systems (DL-MIA) framework that has four main components.
arXiv Detail & Related papers (2022-06-24T17:57:34Z) - Learning Fair Representations via Rate-Distortion Maximization [16.985698188471016]
We present Fairness-aware Rate Maximization (FaRM), that removes demographic information by making representations of instances belonging to the same protected attribute class uncorrelated using the rate-distortion function.
FaRM achieves state-of-the-art performance on several datasets, and learned representations leak significantly less protected attribute information against an attack by a non-linear probing network.
arXiv Detail & Related papers (2022-01-31T19:00:52Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - RDP-GAN: A R\'enyi-Differential Privacy based Generative Adversarial
Network [75.81653258081435]
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection.
However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information.
We propose a R'enyi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training.
arXiv Detail & Related papers (2020-07-04T09:51:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.