Representation Learning for High-Dimensional Data Collection under Local
Differential Privacy
- URL: http://arxiv.org/abs/2010.12464v3
- Date: Sat, 14 May 2022 11:38:04 GMT
- Title: Representation Learning for High-Dimensional Data Collection under Local
Differential Privacy
- Authors: Alex Mansbridge, Gregory Barbour, Davide Piras, Michael Murray,
Christopher Frye, Ilya Feige, David Barber
- Abstract summary: Local differential privacy (LDP) offers a rigorous approach to preserving privacy.
Existing LDP mechanisms have successfully been applied to low-dimensional data.
In high dimensions the privacy-inducing noise largely destroys the utility of the data.
- Score: 18.98782927283319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The collection of individuals' data has become commonplace in many
industries. Local differential privacy (LDP) offers a rigorous approach to
preserving privacy whereby the individual privatises their data locally,
allowing only their perturbed datum to leave their possession. LDP thus
provides a provable privacy guarantee to the individual against both
adversaries and database administrators. Existing LDP mechanisms have
successfully been applied to low-dimensional data, but in high dimensions the
privacy-inducing noise largely destroys the utility of the data. In this work,
our contributions are two-fold: first, by adapting state-of-the-art techniques
from representation learning, we introduce a novel approach to learning LDP
mechanisms. These mechanisms add noise to powerful representations on the
low-dimensional manifold underlying the data, thereby overcoming the
prohibitive noise requirements of LDP in high dimensions. Second, we introduce
a novel denoising approach for downstream model learning. The training of
performant machine learning models using collected LDP data is a common goal
for data collectors, and downstream model performance forms a proxy for the LDP
data utility. Our approach significantly outperforms current state-of-the-art
LDP mechanisms.
Related papers
- SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy [0.0]
We investigate capability of Large Language Models (Ms) to generate synthetic datasets with Differential Privacy (DP) mechanisms.
Our approach incorporates DP-based noise injection methods, including Laplace and Gaussian distributions, into the data generation process.
We then evaluate the utility of these DP-enhanced synthetic datasets by comparing the performance of ML models trained on them against models trained on the original data.
arXiv Detail & Related papers (2024-12-30T01:10:10Z) - Noisy Data Meets Privacy: Training Local Models with Post-Processed Remote Queries [7.993286956508782]
LDPKiT generates a privacy-preserving inference dataset aligned with private data distribution.
Experiments on Fashion-MNIST, SVHN and PathMNIST medical datasets demonstrate that LDPKiT effectively improves utility while preserving privacy.
arXiv Detail & Related papers (2024-05-25T21:53:58Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z) - LDP-Fed: Federated Learning with Local Differential Privacy [14.723892247530234]
We present LDP-Fed, a novel federated learning system with a formal privacy guarantee using local differential privacy (LDP)
Existing LDP protocols are developed primarily to ensure data privacy in the collection of single numerical or categorical values.
In federated learning model parameter updates are collected iteratively from each participant.
arXiv Detail & Related papers (2020-06-05T19:15:13Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.