Differentially Private n-gram Extraction
- URL: http://arxiv.org/abs/2108.02831v1
- Date: Thu, 5 Aug 2021 19:53:16 GMT
- Title: Differentially Private n-gram Extraction
- Authors: Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni and Sergey Yekhanin
- Abstract summary: We revisit the problem of $n$-gram extraction in the differential privacy setting.
In this problem, given a corpus of private text data, the goal is to release as many $n-grams as possible while preserving user level privacy.
We develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art.
- Score: 19.401898070938593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit the problem of $n$-gram extraction in the differential privacy
setting. In this problem, given a corpus of private text data, the goal is to
release as many $n$-grams as possible while preserving user level privacy.
Extracting $n$-grams is a fundamental subroutine in many NLP applications such
as sentence completion, response generation for emails etc. The problem also
arises in other applications such as sequence mining, and is a generalization
of recently studied differentially private set union (DPSU). In this paper, we
develop a new differentially private algorithm for this problem which, in our
experiments, significantly outperforms the state-of-the-art. Our improvements
stem from combining recent advances in DPSU, privacy accounting, and new
heuristics for pruning in the tree-based approach initiated by Chen et al.
(2012).
Related papers
- Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.
There is no differentially private distributed method for training, non-feedback problems.
We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z) - Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm.
DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms.
We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z) - Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent [1.0742675209112622]
Gradient Descent (SGD) is a widely used tool in machine learning.
Uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors.
We propose a novel block bootstrap for SGD under local differential privacy.
arXiv Detail & Related papers (2024-05-21T07:47:21Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - Privacy Profiles for Private Selection [21.162924003105484]
We work out an easy-to-use recipe that bounds privacy profiles of ReportNoisyMax and PrivateTuning using the privacy profiles of the base algorithms they corral.
Our approach improves over all regimes of interest and leads to substantial benefits in end-to-end private learning experiments.
arXiv Detail & Related papers (2024-02-09T08:31:46Z) - The Challenge of Differentially Private Screening Rules [32.18582226044492]
We develop the first differentially private screening rule for linear and logistic regression.
In doing so, we discover difficulties in the task of making a useful private screening rule due to the amount of noise added to ensure privacy.
arXiv Detail & Related papers (2023-03-18T01:45:34Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Improved Regret for Differentially Private Exploration in Linear MDP [31.567811502343552]
We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records.
We provide a private algorithm with an improved regret rate with an optimal dependence of $O(sqrtK)$ on the number of episodes.
arXiv Detail & Related papers (2022-02-02T21:32:09Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z) - BUDS: Balancing Utility and Differential Privacy by Shuffling [3.618133010429131]
Balancing utility and differential privacy by shuffling or textitBUDS is an approach towards crowd-sourced, statistical databases.
New algorithm is proposed using one-hot encoding and iterative shuffling with the loss estimation and risk minimization techniques.
During empirical test of balanced utility and privacy, BUDS produces $epsilon = 0.02$ which is a very promising result.
arXiv Detail & Related papers (2020-06-07T11:39:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.