Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning
- URL: http://arxiv.org/abs/2204.00850v1
- Date: Sat, 2 Apr 2022 12:50:14 GMT
- Title: Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning
- Authors: H\'eber H. Arcolezi
- Abstract summary: Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Private and public organizations regularly collect and analyze digitalized
data about their associates, volunteers, clients, etc. However, because most
personal data are sensitive, there is a key challenge in designing
privacy-preserving systems. To tackle privacy concerns, research communities
have proposed different methods to preserve privacy, with Differential privacy
(DP) standing out as a formal definition that allows quantifying the
privacy-utility trade-off. Besides, with the local DP (LDP) model, users can
sanitize their data locally before transmitting it to the server. The objective
of this thesis is thus two-fold: O$_1$) To improve the utility and privacy in
multiple frequency estimates under LDP guarantees, which is fundamental to
statistical learning. And O$_2$) To assess the privacy-utility trade-off of
machine learning (ML) models trained over differentially private data. For
O$_1$, we first tackled the problem from two "multiple" perspectives, i.e.,
multiple attributes and multiple collections throughout time, while focusing on
utility. Secondly, we focused our attention on the multiple attributes aspect
only, in which we proposed a solution focusing on privacy while preserving
utility. In both cases, we demonstrate through analytical and experimental
validations the advantages of our proposed solutions over state-of-the-art LDP
protocols. For O$_2$, we empirically evaluated ML-based solutions designed to
solve real-world problems while ensuring DP guarantees. Indeed, we mainly used
the input data perturbation setting from the privacy-preserving ML literature.
This is the situation in which the whole dataset is sanitized independently
and, thus, we implemented LDP algorithms from the perspective of the
centralized data owner. In all cases, we concluded that differentially private
ML models achieve nearly the same utility metrics as non-private ones.
Related papers
- Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Personalized Differential Privacy for Ridge Regression [3.4751583941317166]
We introduce our novel Personalized-DP Output Perturbation method ( PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels.
We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model.
We show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al.
arXiv Detail & Related papers (2024-01-30T16:00:14Z) - Federated Experiment Design under Distributed Differential Privacy [31.06808163362162]
We focus on the rigorous protection of users' privacy while minimizing the trust toward service providers.
Although a vital component in modern A/B testing, private distributed experimentation has not previously been studied.
We show how these mechanisms can be scaled up to handle the very large number of participants commonly found in practice.
arXiv Detail & Related papers (2023-11-07T22:38:56Z) - Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs.
PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.