PRIMO: Private Regression in Multiple Outcomes
- URL: http://arxiv.org/abs/2303.04195v1
- Date: Tue, 7 Mar 2023 19:32:13 GMT
- Title: PRIMO: Private Regression in Multiple Outcomes
- Authors: Seth Neel
- Abstract summary: We introduce a new differentially private regression setting we call Private Regression in Multiple Outcomes.
In Subsection $4.1$ we modify techniques based on sufficient statistics perturbation (SSP) to yield greatly improved dependence on $l$.
In Section $5$ we apply our algorithms to the task of private genomic risk prediction for multiple phenotypes using data from the 1000 Genomes project.
- Score: 4.111899441919164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a new differentially private regression setting we call Private
Regression in Multiple Outcomes (PRIMO), inspired the common situation where a
data analyst wants to perform a set of $l$ regressions while preserving
privacy, where the covariates $X$ are shared across all $l$ regressions, and
each regression $i \in [l]$ has a different vector of outcomes $y_i$. While
naively applying private linear regression techniques $l$ times leads to a
$\sqrt{l}$ multiplicative increase in error over the standard linear regression
setting, in Subsection $4.1$ we modify techniques based on sufficient
statistics perturbation (SSP) to yield greatly improved dependence on $l$. In
Subsection $4.2$ we prove an equivalence to the problem of privately releasing
the answers to a special class of low-sensitivity queries we call inner product
queries. Via this equivalence, we adapt the geometric projection-based methods
from prior work on private query release to the PRIMO setting. Under the
assumption the labels $Y$ are public, the projection gives improved results
over the Gaussian mechanism when $n < l\sqrt{d}$, with no asymptotic dependence
on $l$ in the error. In Subsection $4.3$ we study the complexity of our
projection algorithm, and analyze a faster sub-sampling based variant in
Subsection $4.4$. Finally in Section $5$ we apply our algorithms to the task of
private genomic risk prediction for multiple phenotypes using data from the
1000 Genomes project. We find that for moderately large values of $l$ our
techniques drastically improve the accuracy relative to both the naive baseline
that uses existing private regression methods and our modified SSP algorithm
that doesn't use the projection.
Related papers
- On Differentially Private U Statistics [25.683071759227293]
We propose a new thresholding-based approach using emphlocal H'ajek projections to reweight different subsets of the data.
This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
arXiv Detail & Related papers (2024-07-06T03:27:14Z) - Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - Scaling Up Differentially Private LASSO Regularized Logistic Regression
via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively.
Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z) - Efficient Conditionally Invariant Representation Learning [41.320360597120604]
Conditional Independence Regression CovariancE (CIRCE)
Measures of conditional feature dependence require multiple regressions for each step of feature learning.
In experiments, we show superior performance to previous methods on challenging benchmarks.
arXiv Detail & Related papers (2022-12-16T18:39:32Z) - The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero.
We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$.
We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples.
Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z) - Outlier-robust sparse/low-rank least-squares regression and robust
matrix completion [1.0878040851637998]
We study high-dimensional least-squares regression within a subgaussian statistical learning framework with heterogeneous noise.
We also present a novel theory of trace-regression with matrix decomposition based on a new application of the product process.
arXiv Detail & Related papers (2020-12-12T07:42:47Z) - Conditional Uncorrelation and Efficient Non-approximate Subset Selection
in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation.
By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z) - Truncated Linear Regression in High Dimensions [26.41623833920794]
In truncated linear regression, $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_irm T cdot x* + eta_i$ is some fixed unknown vector of interest.
The goal is to recover $x*$ under some favorable conditions on the $A_i$'s and the noise distribution.
We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x*$ from $m$ truncated samples.
arXiv Detail & Related papers (2020-07-29T00:31:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.