Representation Learning for High-Dimensional Data Collection under Local
Differential Privacy
- URL: http://arxiv.org/abs/2010.12464v3
- Date: Sat, 14 May 2022 11:38:04 GMT
- Title: Representation Learning for High-Dimensional Data Collection under Local
Differential Privacy
- Authors: Alex Mansbridge, Gregory Barbour, Davide Piras, Michael Murray,
Christopher Frye, Ilya Feige, David Barber
- Abstract summary: Local differential privacy (LDP) offers a rigorous approach to preserving privacy.
Existing LDP mechanisms have successfully been applied to low-dimensional data.
In high dimensions the privacy-inducing noise largely destroys the utility of the data.
- Score: 18.98782927283319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The collection of individuals' data has become commonplace in many
industries. Local differential privacy (LDP) offers a rigorous approach to
preserving privacy whereby the individual privatises their data locally,
allowing only their perturbed datum to leave their possession. LDP thus
provides a provable privacy guarantee to the individual against both
adversaries and database administrators. Existing LDP mechanisms have
successfully been applied to low-dimensional data, but in high dimensions the
privacy-inducing noise largely destroys the utility of the data. In this work,
our contributions are two-fold: first, by adapting state-of-the-art techniques
from representation learning, we introduce a novel approach to learning LDP
mechanisms. These mechanisms add noise to powerful representations on the
low-dimensional manifold underlying the data, thereby overcoming the
prohibitive noise requirements of LDP in high dimensions. Second, we introduce
a novel denoising approach for downstream model learning. The training of
performant machine learning models using collected LDP data is a common goal
for data collectors, and downstream model performance forms a proxy for the LDP
data utility. Our approach significantly outperforms current state-of-the-art
LDP mechanisms.
Related papers
- FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - Local Privacy-preserving Mechanisms and Applications in Machine Learning [0.21268495173320798]
Local Differential Privacy (LDP) provides strong privacy protection for individual users during the stages of data collection and processing.
One of the major applications of the privacy-preserving mechanisms is machine learning.
arXiv Detail & Related papers (2024-01-08T22:29:00Z) - Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning [32.52811740662061]
This article introduces DP-LoRA, a novel federated learning algorithm tailored for large language models (LLMs)
DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training.
arXiv Detail & Related papers (2023-12-29T06:50:38Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z) - LDP-Fed: Federated Learning with Local Differential Privacy [14.723892247530234]
We present LDP-Fed, a novel federated learning system with a formal privacy guarantee using local differential privacy (LDP)
Existing LDP protocols are developed primarily to ensure data privacy in the collection of single numerical or categorical values.
In federated learning model parameter updates are collected iteratively from each participant.
arXiv Detail & Related papers (2020-06-05T19:15:13Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.