Related papers: The trade-off between data minimization and fairness in collaborative filtering

The trade-off between data minimization and fairness in collaborative filtering

URL: http://arxiv.org/abs/2410.07182v1
Date: Sat, 21 Sep 2024 02:32:26 GMT
Title: The trade-off between data minimization and fairness in collaborative filtering
Authors: Nasim Sonboli, Sipei Li, Mehdi Elahi, Asia Biega,
Abstract summary: General Data Protection Regulations aim to safeguard individuals' personal information from harm. While full compliance is mandatory in the EU, it is not in other places. This paper studies the relationship between principles of data minimization and fairness in recommender systems.
Score: 1.8936798735951967
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: General Data Protection Regulations (GDPR) aim to safeguard individuals' personal information from harm. While full compliance is mandatory in the European Union and the California Privacy Rights Act (CPRA), it is not in other places. GDPR requires simultaneous compliance with all the principles such as fairness, accuracy, and data minimization. However, it overlooks the potential contradictions within its principles. This matter gets even more complex when compliance is required from decision-making systems. Therefore, it is essential to investigate the feasibility of simultaneously achieving the goals of GDPR and machine learning, and the potential tradeoffs that might be forced upon us. This paper studies the relationship between the principles of data minimization and fairness in recommender systems. We operationalize data minimization via active learning (AL) because, unlike many other methods, it can preserve a high accuracy while allowing for strategic data collection, hence minimizing the amount of data collection. We have implemented several active learning strategies (personalized and non-personalized) and conducted a comparative analysis focusing on accuracy and fairness on two publicly available datasets. The results demonstrate that different AL strategies may have different impacts on the accuracy of recommender systems with nearly all strategies negatively impacting fairness. There has been no to very limited work on the trade-off between data minimization and fairness, the pros and cons of active learning methods as tools for implementing data minimization, and the potential impacts of AL on fairness. By exploring these critical aspects, we offer valuable insights for developing recommender systems that are GDPR compliant.

Related papers

Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI [6.671649946926508]
Federated Learning (FL) enables machine learning while preserving data privacy but struggles to balance privacy preservation (PP) and fairness. DP enhances privacy but can disproportionately impact underrepresented groups, while HE and SMC fairness concerns at the cost of computational overhead. Our findings highlight context-dependent trade-offs and offer guidelines for designing FL systems that uphold responsible AI principles, ensuring fairness, privacy, and equitable real-world applications.
arXiv Detail & Related papers (2025-03-20T15:31:01Z)
The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained. It has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z)
Learning to Limit Data Collection via Scaling Laws: Data Minimization Compliance in Practice [62.44110411199835]
We build on literature in machine learning law to propose framework for limiting collection based on data interpretation that ties data to system performance. We formalize a data minimization criterion based on performance curve derivatives and provide an effective and interpretable piecewise power law technique.
arXiv Detail & Related papers (2021-07-16T19:59:01Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
Reviving Purpose Limitation and Data Minimisation in Personalisation, Profiling and Decision-Making Systems [0.0]
This paper determines, through an interdisciplinary law and computer science lens, whether data minimisation and purpose limitation can be meaningfully implemented in data-driven systems. Our analysis reveals that the two legal principles continue to play an important role in mitigating the risks of personal data processing. We highlight that even though these principles are important safeguards in the systems under consideration, there are important limits to their practical implementation.
arXiv Detail & Related papers (2021-01-15T16:36:29Z)
Is Pessimism Provably Efficient for Offline RL? [104.00628430454479]
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. We propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function.
arXiv Detail & Related papers (2020-12-30T09:06:57Z)
Privacy Preservation in Federated Learning: An insightful survey from the GDPR Perspective [10.901568085406753]
Article is dedicated to surveying on the state-of-the-art privacy techniques, which can be employed in Federated learning. Recent research has demonstrated that retaining data and on computation in FL is not enough for privacy-guarantee. This is because ML model parameters exchanged between parties in an FL system, which can be exploited in some privacy attacks.
arXiv Detail & Related papers (2020-11-10T21:41:25Z)
Provably Efficient Causal Reinforcement Learning with Confounded Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
Operationalizing the Legal Principle of Data Minimization for Personalization [64.0027026050706]
We identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization. We find that the performance decrease incurred by data minimization might not be substantial, but it might disparately impact different users.
arXiv Detail & Related papers (2020-05-28T00:43:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.