Task-aware Privacy Preservation for Multi-dimensional Data
- URL: http://arxiv.org/abs/2110.02329v1
- Date: Tue, 5 Oct 2021 20:03:53 GMT
- Title: Task-aware Privacy Preservation for Multi-dimensional Data
- Authors: Jiangnan Cheng, Ao Tang, Sandeep Chinchali
- Abstract summary: Local differential privacy (LDP) is a state-of-the-art technique for privacy preservation.
In the future, LDP can be adopted to anonymize richer user data attributes.
We show how to significantly improve the ultimate task performance for multi-dimensional user data by considering a task-aware privacy preservation problem.
- Score: 4.138783926370621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Local differential privacy (LDP), a state-of-the-art technique for privacy
preservation, has been successfully deployed in a few real-world applications.
In the future, LDP can be adopted to anonymize richer user data attributes that
will be input to more sophisticated machine learning (ML) tasks. However,
today's LDP approaches are largely task-agnostic and often lead to sub-optimal
performance -- they will simply inject noise to all data attributes according
to a given privacy budget, regardless of what features are most relevant for an
ultimate task. In this paper, we address how to significantly improve the
ultimate task performance for multi-dimensional user data by considering a
task-aware privacy preservation problem. The key idea is to use an
encoder-decoder framework to learn (and anonymize) a task-relevant latent
representation of user data, which gives an analytical near-optimal solution
for a linear setting with mean-squared error (MSE) task loss. We also provide
an approximate solution through a learning algorithm for general nonlinear
cases. Extensive experiments demonstrate that our task-aware approach
significantly improves ultimate task accuracy compared to a standard benchmark
LDP approach while guaranteeing the same level of privacy.
Related papers
- DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA.
Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees.
Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Differentially Private Active Learning: Balancing Effective Data Selection and Privacy [11.716423801223776]
We introduce differentially private active learning (DP-AL) for standard learning settings.
We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization.
Our experiments on vision and natural language processing tasks show that DP-AL can improve performance for specific datasets and model architectures.
arXiv Detail & Related papers (2024-10-01T09:34:06Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained.
It has been endorsed by various global data protection regulations.
However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z) - Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization [2.28438857884398]
We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy.
Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy.
The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
arXiv Detail & Related papers (2024-04-24T22:58:42Z) - A Secure Federated Data-Driven Evolutionary Multi-objective Optimization
Algorithm [18.825123863744906]
Most data-driven evolutionary algorithms are centralized, causing privacy and security concerns.
This paper proposes a secure federated data-driven evolutionary multi-objective optimization algorithm.
Experimental results show that the proposed algorithm can protect privacy and enhance security with only negligible sacrifice.
arXiv Detail & Related papers (2022-10-15T13:39:49Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.