BUDS: Balancing Utility and Differential Privacy by Shuffling
- URL: http://arxiv.org/abs/2006.04125v1
- Date: Sun, 7 Jun 2020 11:39:13 GMT
- Title: BUDS: Balancing Utility and Differential Privacy by Shuffling
- Authors: Poushali Sengupta, Sudipta Paul, Subhankar Mishra
- Abstract summary: Balancing utility and differential privacy by shuffling or textitBUDS is an approach towards crowd-sourced, statistical databases.
New algorithm is proposed using one-hot encoding and iterative shuffling with the loss estimation and risk minimization techniques.
During empirical test of balanced utility and privacy, BUDS produces $epsilon = 0.02$ which is a very promising result.
- Score: 3.618133010429131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Balancing utility and differential privacy by shuffling or \textit{BUDS} is
an approach towards crowd-sourced, statistical databases, with strong privacy
and utility balance using differential privacy theory. Here, a novel algorithm
is proposed using one-hot encoding and iterative shuffling with the loss
estimation and risk minimization techniques, to balance both the utility and
privacy. In this work, after collecting one-hot encoded data from different
sources and clients, a step of novel attribute shuffling technique using
iterative shuffling (based on the query asked by the analyst) and loss
estimation with an updation function and risk minimization produces a utility
and privacy balanced differential private report. During empirical test of
balanced utility and privacy, BUDS produces $\epsilon = 0.02$ which is a very
promising result. Our algorithm maintains a privacy bound of $\epsilon = ln
[t/((n_1 - 1)^S)]$ and loss bound of $c' \bigg|e^{ln[t/((n_1 - 1)^S)]} -
1\bigg|$.
Related papers
- On Computing Pairwise Statistics with Local Differential Privacy [55.81991984375959]
We study the problem of computing pairwise statistics, i.e., ones of the form $binomn2-1 sum_i ne j f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model.
This formulation captures important metrics such as Kendall's $tau$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc.
arXiv Detail & Related papers (2024-06-24T04:06:09Z) - A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility [4.7712438974100255]
We show how to shuffle $(epsilon_i,delta_i)$-PLDP setting with personalized privacy parameters.
We prove that shuffled $(epsilon_i,delta_i)$-PLDP process approximately preserves $mu$-Gaussian Differential Privacy with mu = sqrtfrac2sum_i=1n frac1-delta_i1+eepsilon_i-max_ifrac1-delta_i1+e
arXiv Detail & Related papers (2023-12-22T02:31:46Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Infinitely Divisible Noise in the Low Privacy Regime [9.39772079241093]
Federated learning, in which training data is distributed among users and never shared, has emerged as a popular approach to privacy-preserving machine learning.
We present the first divisible infinitely noise distribution for real-valued data that achieves $varepsilon$-differential privacy.
arXiv Detail & Related papers (2021-10-13T08:16:43Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Output Perturbation for Differentially Private Convex Optimization with
Improved Population Loss Bounds, Runtimes and Applications to Private
Adversarial Training [12.386462516398469]
Finding efficient, easily implementable differentially private (DP) algorithms that offer strong excess risk bounds is an important problem in modern machine learning.
We provide the tightest known $(epsilon, 0)$-DP population loss bounds and fastest runtimes under the presence of smoothness and strong convexity.
We apply our theory to two learning frameworks: tilted ERM and adversarial learning frameworks.
arXiv Detail & Related papers (2021-02-09T08:47:06Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.