Related papers: Differentially Private Kernel Density Estimation

Differentially Private Kernel Density Estimation

URL: http://arxiv.org/abs/2409.01688v2
Date: Tue, 5 Nov 2024 01:47:36 GMT
Title: Differentially Private Kernel Density Estimation
Authors: Erzhi Liu, Jerry Yao-Chieh Hu, Alex Reneau, Zhao Song, Han Liu,
Abstract summary: We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE) We study the mathematical problem: given a similarity function $f$ and a private dataset $X subset mathbbRd$, our goal is to preprocess $X$ so that for any query $yinmathbbRd$, we approximate $sum_x in X f(x, y)$ in a differentially private fashion.
Score: 11.526850085349155
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE), offering not only improved privacy-utility tradeoff but also better efficiency over prior results. Specifically, we study the mathematical problem: given a similarity function $f$ (or DP KDE) and a private dataset $X \subset \mathbb{R}^d$, our goal is to preprocess $X$ so that for any query $y\in\mathbb{R}^d$, we approximate $\sum_{x \in X} f(x, y)$ in a differentially private fashion. The best previous algorithm for $f(x,y) =\| x - y \|_1$ is the node-contaminated balanced binary tree by [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. Their algorithm requires $O(nd)$ space and time for preprocessing with $n=|X|$. For any query point, the query time is $d \log n$, with an error guarantee of $(1+\alpha)$-approximation and $\epsilon^{-1} \alpha^{-0.5} d^{1.5} R \log^{1.5} n$. In this paper, we improve the best previous result [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024] in three aspects: - We reduce query time by a factor of $\alpha^{-1} \log n$. - We improve the approximation ratio from $\alpha$ to 1. - We reduce the error dependence by a factor of $\alpha^{-0.5}$. From a technical perspective, our method of constructing the search tree differs from previous work [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. In prior work, for each query, the answer is split into $\alpha^{-1} \log n$ numbers, each derived from the summation of $\log n$ values in interval tree countings. In contrast, we construct the tree differently, splitting the answer into $\log n$ numbers, where each is a smart combination of two distance values, two counting values, and $y$ itself. We believe our tree structure may be of independent interest.

Related papers

Private Continual Counting of Unbounded Streams [11.941250828872189]
We study the problem of differentially private continual counting in the unbounded setting where the input size $n$ is not known in advance.<n>Using the common doubling trick' avoids knowledge of $n$ but leads to suboptimal and non-smooth error.<n>We introduce novel matrix factorizations based on logarithmic perturbations of the function $frac1sqrt1-z$ studied in prior works.
arXiv Detail & Related papers (2025-06-17T23:09:53Z)
Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point [3.793609515750114]
We study the problem of robustly estimating the edge density of ErdHos-R'enyi random graphs $G(n, dcirc/n)$. Our algorithm is based on the sum-of-squares hierarchy.
arXiv Detail & Related papers (2025-03-05T21:45:17Z)
PREM: Privately Answering Statistical Queries with Relative Error [91.98332694700046]
We introduce $mathsfPREM$ (Private Relative Error Multiplicative weight update), a new framework for generating synthetic data that a relative error guarantee for statistical queries under $(varepsilon, delta)$ differential privacy (DP) We complement our algorithm with nearly matching lower bounds.
arXiv Detail & Related papers (2025-02-20T18:32:02Z)
Profile Reconstruction from Private Sketches [13.929335175122265]
Given a multiset of $n$ items from $mathcalD$, the emphprofile reconstruction problem is to estimate, for $t = 0, 1, dots, n$, the fraction $vecf[t]$ of items in $mathcalD$ that appear exactly $tfty times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset. We show how to speed up their LP-based technique
arXiv Detail & Related papers (2024-06-03T09:51:28Z)
Differentially Private Approximate Pattern Matching [0.0]
We consider the $k$-approximate pattern matching problem under differential privacy. The goal is to report or count allimate variants of a given string $S$ which have a Hamming distance at most $k$ to a pattern $P$. We give 1) an $epsilon$-differentially private algorithm with an additive error of $O(epsilon-1log n)$ and no multiplicative error for the existence variant; 2) an $epsilon$-differentially private algorithm with an additive error $O(epsilon-1
arXiv Detail & Related papers (2023-11-13T15:53:45Z)
A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z)
Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector. We show that this dependence on $d$ is optimal, up to logarithmic factors. We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z)
Coresets for Decision Trees of Signals [19.537354146654845]
We provide the first algorithm that outputs such a $(k,varepsilon)$-coreset for emphevery such matrix $D$. This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry.
arXiv Detail & Related papers (2021-10-07T05:49:55Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
On Avoiding the Union Bound When Answering Multiple Differentially Private Queries [49.453751858361265]
We give an algorithm for this task that achieves an expected $ell_infty$ error bound of $O(frac1epsilonsqrtk log frac1delta)$. On the other hand, the algorithm of Dagan and Kur has a remarkable advantage that the $ell_infty$ error bound of $O(frac1epsilonsqrtk log frac1delta)$ holds not only in expectation but always.
arXiv Detail & Related papers (2020-12-16T17:58:45Z)
Universal guarantees for decision tree induction via a higher-order splitting criterion [16.832966312395126]
Our algorithm achieves provable guarantees for all target functions $f: -1,1n to -1,1$ with respect to the uniform distribution. The crux of our extension is a new splitting criterion that takes into account the correlations between $f$ and small subsets of its attributes. Our algorithm satisfies the following guarantee: for all target functions $f : -1,1n to -1,1$, sizes $sin mathbbN$, and error parameters $epsilon$, it constructs a decision
arXiv Detail & Related papers (2020-10-16T21:20:45Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
We prove that for every decision tree, the absolute values of the Fourier coefficients of a given order $ellsqrtbinomdell (1+log n)ell-1,$ sum to at most $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant.
arXiv Detail & Related papers (2020-08-24T06:50:57Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.