Fair Play for Individuals, Foul Play for Groups? Auditing Anonymization's Impact on ML Fairness
- URL: http://arxiv.org/abs/2505.07985v1
- Date: Mon, 12 May 2025 18:32:28 GMT
- Title: Fair Play for Individuals, Foul Play for Groups? Auditing Anonymization's Impact on ML Fairness
- Authors: Héber H. Arcolezi, Mina Alishahi, Adda-Akram Bendoukha, Nesrine Kaaniche,
- Abstract summary: Anonymization techniques can make it more difficult to accurately identify individuals.<n>Group fairness metrics can be degraded by up to four orders of magnitude.<n>Individual fairness metrics tend to improve under stronger anonymization.<n>This study provides critical insights into the trade-offs between privacy, fairness, and utility.
- Score: 1.1999555634662633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) algorithms are heavily based on the availability of training data, which, depending on the domain, often includes sensitive information about data providers. This raises critical privacy concerns. Anonymization techniques have emerged as a practical solution to address these issues by generalizing features or suppressing data to make it more difficult to accurately identify individuals. Although recent studies have shown that privacy-enhancing technologies can influence ML predictions across different subgroups, thus affecting fair decision-making, the specific effects of anonymization techniques, such as $k$-anonymity, $\ell$-diversity, and $t$-closeness, on ML fairness remain largely unexplored. In this work, we systematically audit the impact of anonymization techniques on ML fairness, evaluating both individual and group fairness. Our quantitative study reveals that anonymization can degrade group fairness metrics by up to four orders of magnitude. Conversely, similarity-based individual fairness metrics tend to improve under stronger anonymization, largely as a result of increased input homogeneity. By analyzing varying levels of anonymization across diverse privacy settings and data distributions, this study provides critical insights into the trade-offs between privacy, fairness, and utility, offering actionable guidelines for responsible AI development. Our code is publicly available at: https://github.com/hharcolezi/anonymity-impact-fairness.
Related papers
- Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z) - A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results [5.618541935188389]
Differential privacy (DP) is the predominant solution for privacy-preserving Machine learning (ML) algorithms.
Recent experimental studies have shown that local DP can impact ML prediction for different subgroups of individuals.
We study how the fairness of the decisions made by the ML model changes under local DP for different levels of privacy and data distributions.
arXiv Detail & Related papers (2024-05-23T15:54:03Z) - The Double-Edged Sword of Big Data and Information Technology for the
Disadvantaged: A Cautionary Tale from Open Banking [0.3867363075280543]
Open Banking has ignited a revolution in financial services, opening new opportunities for customer acquisition, management, retention, and risk assessment.
We investigate the dimensions of financial vulnerability (FV), a global concern resulting from COVID-19 and rising inflation.
Using a unique dataset from a UK FinTech lender, we demonstrate the power of fine-grained transaction data while simultaneously cautioning its safe usage.
arXiv Detail & Related papers (2023-07-25T11:07:43Z) - (Local) Differential Privacy has NO Disparate Impact on Fairness [4.157415305926584]
Local Differential Privacy (LDP) has gained widespread adoption in real-world applications.
This paper empirically studies the impact of collecting multiple sensitive attributes under LDP on fairness.
arXiv Detail & Related papers (2023-04-25T14:18:12Z) - When Fairness Meets Privacy: Fair Classification with Semi-Private
Sensitive Attributes [18.221858247218726]
We study a novel and practical problem of fair classification in a semi-private setting.
Most of the sensitive attributes are private and only a small amount of clean ones are available.
We propose a novel framework FairSP that can achieve Fair prediction under the Semi-Private setting.
arXiv Detail & Related papers (2022-07-18T01:10:25Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - "You Can't Fix What You Can't Measure": Privately Measuring Demographic
Performance Disparities in Federated Learning [78.70083858195906]
We propose differentially private mechanisms to measure differences in performance across groups while protecting the privacy of group membership.
Our results show that, contrary to what prior work suggested, protecting privacy is not necessarily in conflict with identifying performance disparities of federated models.
arXiv Detail & Related papers (2022-06-24T09:46:43Z) - Joint Multisided Exposure Fairness for Recommendation [76.75990595228666]
This paper formalizes a family of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers.
Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in recommendation.
arXiv Detail & Related papers (2022-04-29T19:13:23Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Decision Making with Differential Privacy under a Fairness Lens [65.16089054531395]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes.
To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data.
This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z) - On the Privacy Risks of Algorithmic Fairness [9.429448411561541]
We study the privacy risks of group fairness through the lens of membership inference attacks.
We show that fairness comes at the cost of privacy, and this cost is not distributed equally.
arXiv Detail & Related papers (2020-11-07T09:15:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.