Related papers: PRIMO: Private Regression in Multiple Outcomes

PRIMO: Private Regression in Multiple Outcomes

URL: http://arxiv.org/abs/2303.04195v1
Date: Tue, 7 Mar 2023 19:32:13 GMT
Title: PRIMO: Private Regression in Multiple Outcomes
Authors: Seth Neel
Abstract summary: We introduce a new differentially private regression setting we call Private Regression in Multiple Outcomes. In Subsection $4.1$ we modify techniques based on sufficient statistics perturbation (SSP) to yield greatly improved dependence on $l$. In Section $5$ we apply our algorithms to the task of private genomic risk prediction for multiple phenotypes using data from the 1000 Genomes project.
Score: 4.111899441919164
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a new differentially private regression setting we call Private Regression in Multiple Outcomes (PRIMO), inspired the common situation where a data analyst wants to perform a set of $l$ regressions while preserving privacy, where the covariates $X$ are shared across all $l$ regressions, and each regression $i \in [l]$ has a different vector of outcomes $y_i$. While naively applying private linear regression techniques $l$ times leads to a $\sqrt{l}$ multiplicative increase in error over the standard linear regression setting, in Subsection $4.1$ we modify techniques based on sufficient statistics perturbation (SSP) to yield greatly improved dependence on $l$. In Subsection $4.2$ we prove an equivalence to the problem of privately releasing the answers to a special class of low-sensitivity queries we call inner product queries. Via this equivalence, we adapt the geometric projection-based methods from prior work on private query release to the PRIMO setting. Under the assumption the labels $Y$ are public, the projection gives improved results over the Gaussian mechanism when $n < l\sqrt{d}$, with no asymptotic dependence on $l$ in the error. In Subsection $4.3$ we study the complexity of our projection algorithm, and analyze a faster sub-sampling based variant in Subsection $4.4$. Finally in Section $5$ we apply our algorithms to the task of private genomic risk prediction for multiple phenotypes using data from the 1000 Genomes project. We find that for moderately large values of $l$ our techniques drastically improve the accuracy relative to both the naive baseline that uses existing private regression methods and our modified SSP algorithm that doesn't use the projection.

Related papers

Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding [52.1068936424622]
We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention. In settings without selection bias or confounding, $E[Y|do(X)] = E[Y|X]$, which can be estimated using standard regression methods. We propose a framework that incorporates both selection bias and confounding.
arXiv Detail & Related papers (2025-03-26T13:43:37Z)
Better Locally Private Sparse Estimation Given Multiple Samples Per User [2.9562742331218725]
We investigate user-level locally differentially private sparse linear regression. We show that with $n$ users each contributing $m$ samples, the linear dependency of dimension $d$ can be eliminated. We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space.
arXiv Detail & Related papers (2024-08-08T08:47:20Z)
Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup. We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z)
Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution [0.0]
We introduce an iterative strategy to correct bias effectively when the dimension $p$ is less than the sample size $n$. For $p>n$, our method optimally mitigates the bias such that any remaining bias in the proposed de-biased estimator is unattainable. Our method offers a transformative solution to the bias challenge in ridge regression inferences across various disciplines.
arXiv Detail & Related papers (2024-05-01T10:05:19Z)
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses. We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z)
Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively. Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z)
Efficient Conditionally Invariant Representation Learning [41.320360597120604]
Conditional Independence Regression CovariancE (CIRCE) Measures of conditional feature dependence require multiple regressions for each step of feature learning. In experiments, we show superior performance to previous methods on challenging benchmarks.
arXiv Detail & Related papers (2022-12-16T18:39:32Z)
Streaming Sparse Linear Regression [1.8707139489039097]
We propose a novel online sparse linear regression framework for analyzing streaming data when data points arrive sequentially. Our proposed method is memory efficient and requires less stringent restricted strong convexity assumptions.
arXiv Detail & Related papers (2022-11-11T07:31:55Z)
The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. We study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests.
arXiv Detail & Related papers (2022-11-03T17:55:50Z)
Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model. As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z)
Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z)
Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z)
Outlier-robust sparse/low-rank least-squares regression and robust matrix completion [1.0878040851637998]
We study high-dimensional least-squares regression within a subgaussian statistical learning framework with heterogeneous noise. We also present a novel theory of trace-regression with matrix decomposition based on a new application of the product process.
arXiv Detail & Related papers (2020-12-12T07:42:47Z)
Conditional Uncorrelation and Efficient Non-approximate Subset Selection in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation. By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z)
Truncated Linear Regression in High Dimensions [26.41623833920794]
In truncated linear regression, $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_irm T cdot x* + eta_i$ is some fixed unknown vector of interest. The goal is to recover $x*$ under some favorable conditions on the $A_i$'s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x*$ from $m$ truncated samples.
arXiv Detail & Related papers (2020-07-29T00:31:34Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.