Related papers: Better Locally Private Sparse Estimation Given Multiple Samples Per User

Better Locally Private Sparse Estimation Given Multiple Samples Per User

URL: http://arxiv.org/abs/2408.04313v1
Date: Thu, 8 Aug 2024 08:47:20 GMT
Title: Better Locally Private Sparse Estimation Given Multiple Samples Per User
Authors: Yuheng Ma, Ke Jia, Hanfang Yang,
Abstract summary: We investigate user-level locally differentially private sparse linear regression. We show that with $n$ users each contributing $m$ samples, the linear dependency of dimension $d$ can be eliminated. We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space.
Score: 2.9562742331218725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Previous studies yielded discouraging results for item-level locally differentially private linear regression with $s^*$-sparsity assumption, where the minimax rate for $nm$ samples is $\mathcal{O}(s^{*}d / nm\varepsilon^2)$. This can be challenging for high-dimensional data, where the dimension $d$ is extremely large. In this work, we investigate user-level locally differentially private sparse linear regression. We show that with $n$ users each contributing $m$ samples, the linear dependency of dimension $d$ can be eliminated, yielding an error upper bound of $\mathcal{O}(s^{*2} / nm\varepsilon^2)$. We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space, which is extendable to general sparse estimation problems with tight error bounds. Experiments on both synthetic and real datasets demonstrate the superiority of the proposed methods. Both the theoretical and empirical results suggest that, with the same number of samples, locally private sparse estimation is better conducted when multiple samples per user are available.

Related papers

Nearly Optimal Differentially Private ReLU Regression [18.599299269974498]
We investigate one of the most fundamental non learning problems, ReLU regression, in the Differential Privacy (DP) model. We show that it is possible to achieve an upper bound of $TildeO(fracd2N2 varepsilon2N2 varepsilon2N2 varepsilon2N2 varepsilon2N2 varepsilon2N2 varepsilon2N2 vareps
arXiv Detail & Related papers (2025-03-08T02:09:47Z)
Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality. We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z)
Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved. This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z)
Distribution-Aware Mean Estimation under User-level Local Differential Privacy [5.267844649650687]
We consider the problem of mean estimation under user-level local differential privacy, where $n$ users are contributing through their local pool of data samples. Based on a distribution-aware mean estimation algorithm, we establish an $M$-dependent upper bounds on the worst-case risk over $mu$ for the task of mean estimation.
arXiv Detail & Related papers (2024-10-12T11:57:52Z)
Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression [4.396860522241307]
We show that an efficient learning algorithm for sparse linear regression can be used to solve sparse PCA problems with a negative spike. We complement our reduction with low-degree and statistical query lower bounds for the sparse problems from which we reduce.
arXiv Detail & Related papers (2024-02-21T19:55:01Z)
Weighted least-squares approximation with determinantal point processes and generalized volume sampling [33.33724208084121]
We consider the problem of approximating a function from $L2$ by an element of a given $m$-dimensional space $V_m$. We show that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm.
arXiv Detail & Related papers (2023-12-21T17:34:18Z)
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures. This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z)
Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z)
Discrete Distribution Estimation under User-level Local Differential Privacy [37.65849910114053]
We study discrete distribution estimation under user-level local differential privacy (LDP) In user-level $varepsilon$-LDP, each user has $mge1$ samples and the privacy of all $m$ samples must be preserved simultaneously.
arXiv Detail & Related papers (2022-11-07T18:29:32Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data [8.55881355051688]
We provide the first study on the problem of DP-SCO with heavy-tailed data in the high dimensional space. We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get a (high probability) error bound (excess population risk) of $tildeO(fraclog d(nepsilon)frac13)$ in the $epsilon$-DP model. In the second part of the paper, we study sparse learning with heavy-tailed data.
arXiv Detail & Related papers (2021-07-23T11:03:21Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution. We show that the constraint of local differential privacy incurs an exponential increase in cost. Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.