Human-Centered Interactive Anonymization for Privacy-Preserving Machine Learning: A Case for Human-Guided k-Anonymity
- URL: http://arxiv.org/abs/2507.04104v1
- Date: Sat, 05 Jul 2025 17:20:18 GMT
- Title: Human-Centered Interactive Anonymization for Privacy-Preserving Machine Learning: A Case for Human-Guided k-Anonymity
- Authors: Sri Harsha Gajavalli,
- Abstract summary: We propose an interactive approach that incorporates human input into the k-anonymization process.<n>Using the UCI Adult dataset, we compare classification outcomes of interactive human-influenced anonymization with traditional, fully automated methods.<n>Our results show that human input can enhance data utility in some cases, although results vary across tasks and settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy-preserving machine learning (ML) seeks to balance data utility and privacy, especially as regulations like the GDPR mandate the anonymization of personal data for ML applications. Conventional anonymization approaches often reduce data utility due to indiscriminate generalization or suppression of data attributes. In this study, we propose an interactive approach that incorporates human input into the k-anonymization process, enabling domain experts to guide attribute preservation based on contextual importance. Using the UCI Adult dataset, we compare classification outcomes of interactive human-influenced anonymization with traditional, fully automated methods. Our results show that human input can enhance data utility in some cases, although results vary across tasks and settings. We discuss limitations of our approach and suggest potential areas for improved interactive frameworks in privacy-aware ML.
Related papers
- Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility [2.1715431485081593]
Differential Privacy (DP) provides formal guarantees against re-identification risks.<n>We generate DP synthetic data for Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS)<n>We evaluate the utility of the synthetic data using a framework informed by actual uses of the LEMURS dataset.
arXiv Detail & Related papers (2025-06-30T15:58:34Z) - Differential Privacy in Machine Learning: From Symbolic AI to LLMs [49.1574468325115]
Differential privacy provides a formal framework to mitigate privacy risks.<n>It ensures that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm.
arXiv Detail & Related papers (2025-06-13T11:30:35Z) - PASS: Private Attributes Protection with Stochastic Data Substitution [46.38957234350463]
Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks.<n> PASS is designed to substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function.<n>The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS's effectiveness and generalizability.
arXiv Detail & Related papers (2025-06-08T22:48:07Z) - Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z) - Privacy Preserving Machine Learning Model Personalization through Federated Personalized Learning [0.0]
There has been a seismic shift in interest towards the leading paradigm for training Machine Learning (ML) models on decentralized data silos while maintaining data privacy, Federated Learning (FL)<n>This research paper presents a comprehensive performance analysis of a cutting-edge approach to personalize ML model while preserving privacy achieved through Privacy Preserving Machine Learning.<n>According to our analysis, Adaptive Personalized Cross-Silo Federated Learning with Differential Privacy (APPLE+DP) offering efficient execution whereas overall, the use of the Adaptive Personalized Cross-Silo Federated Learning with Homomorphic Encryption (APPLE+HE) algorithm for privacy-preserving machine learning tasks is strongly
arXiv Detail & Related papers (2025-05-03T11:31:38Z) - Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains [9.123834467375532]
We explore the feasibility of using synthetic data generated from differentially private language models in place of real data to facilitate the development of NLP in high-stakes domains.
Our results show that prior simplistic evaluations have failed to highlight utility, privacy, and fairness issues in the synthetic data.
arXiv Detail & Related papers (2024-10-10T19:31:02Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Anonymizing text that contains sensitive information is crucial for a wide range of applications.<n>Existing techniques face the emerging challenges of the re-identification ability of large language models.<n>We propose a framework composed of three key components: a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks [4.66054169739129]
In this paper, we explore the balance between preserving data usefulness and ensuring robust privacy safeguards.
We share our method for manually pseudonymizing a multilingual radicalization dataset, ensuring performance comparable to the original data.
arXiv Detail & Related papers (2024-06-25T18:30:25Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Privacy Risks in Reinforcement Learning for Household Robots [42.675213619562975]
Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information.<n>This paper proposes an attack on the training process of the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervisory signals.
arXiv Detail & Related papers (2023-06-15T16:53:26Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss.
We experimentally evaluate our approach on queries over private databases.
We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z) - Sensitivity analysis in differentially private machine learning using
hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis.
This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data.
Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.