Making Users Indistinguishable: Attribute-wise Unlearning in Recommender
Systems
- URL: http://arxiv.org/abs/2310.05847v1
- Date: Fri, 6 Oct 2023 09:36:44 GMT
- Title: Making Users Indistinguishable: Attribute-wise Unlearning in Recommender
Systems
- Authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han,
Dan Meng, Jun Wang
- Abstract summary: We find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training.
To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable.
- Score: 28.566330708233824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the growing privacy concerns in recommender systems, recommendation
unlearning, i.e., forgetting the impact of specific learned targets, is getting
increasing attention. Existing studies predominantly use training data, i.e.,
model inputs, as the unlearning target. However, we find that attackers can
extract private information, i.e., gender, race, and age, from a trained model
even if it has not been explicitly encountered during training. We name this
unseen information as attribute and treat it as the unlearning target. To
protect the sensitive attribute of users, Attribute Unlearning (AU) aims to
degrade attacking performance and make target attributes indistinguishable. In
this paper, we focus on a strict but practical setting of AU, namely
Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be
performed after the training of the recommendation model is completed. To
address the PoT-AU problem in recommender systems, we design a two-component
loss function that consists of i) distinguishability loss: making attribute
labels indistinguishable from attackers, and ii) regularization loss:
preventing drastic changes in the model that result in a negative impact on
recommendation performance. Specifically, we investigate two types of
distinguishability measurements, i.e., user-to-user and
distribution-to-distribution. We use the stochastic gradient descent algorithm
to optimize our proposed loss. Extensive experiments on three real-world
datasets demonstrate the effectiveness of our proposed methods.
Related papers
- Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.
This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.
We propose two novel techniques for robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback [110.16220825629749]
Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models.
In this work, we identify four core aspects of preference-based learning: preference data, learning algorithm, reward model, and policy training prompts.
Our findings indicate that all aspects are important for performance, with better preference data leading to the largest improvements.
arXiv Detail & Related papers (2024-06-13T16:17:21Z) - Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective [4.31734012105466]
Machine Unlearning is the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model.
We propose a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network.
Our novel approach, termed textbfPartially-Blinded Unlearning (PBU), surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness.
arXiv Detail & Related papers (2024-03-24T17:33:22Z) - Post-Training Attribute Unlearning in Recommender Systems [37.67195112898097]
Existing studies predominantly use training data, i.e., model inputs, as unlearning target.
We name this unseen information as textitattribute and treat it as unlearning target.
To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to make target attributes indistinguishable.
arXiv Detail & Related papers (2024-03-11T14:02:24Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Debiasing Learning for Membership Inference Attacks Against Recommender
Systems [79.48353547307887]
Learned recommender systems may inadvertently leak information about their training data, leading to privacy violations.
We investigate privacy threats faced by recommender systems through the lens of membership inference.
We propose a Debiasing Learning for Membership Inference Attacks against recommender systems (DL-MIA) framework that has four main components.
arXiv Detail & Related papers (2022-06-24T17:57:34Z) - Unlearning Protected User Attributes in Recommendations with Adversarial
Training [10.268369743620159]
Collaborative filtering algorithms capture underlying consumption patterns, including the ones specific to particular demographics or protected information of users.
These encoded biases can influence the decision of a recommendation system towards further separation of the contents provided to various demographic subgroups.
In this work, we investigate the possibility and challenges of removing specific protected information of users from the learned interaction representations of a RS algorithm.
arXiv Detail & Related papers (2022-06-09T13:36:28Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.