Differentially Private Bayesian Inference for Generalized Linear Models
- URL: http://arxiv.org/abs/2011.00467v3
- Date: Wed, 12 May 2021 08:57:39 GMT
- Title: Differentially Private Bayesian Inference for Generalized Linear Models
- Authors: Tejas Kulkarni, Joonas J\"alk\"o, Antti Koskela, Samuel Kaski and
Antti Honkela
- Abstract summary: Generalized linear models (GLMs) such as logistic regression are among the most widely used arms in data analyst's repertoire.
We introduce a generic noise-aware DP Bayesian inference method for a GLM, given a noisy sum of summary statistics.
We experimentally demonstrate that the posteriors obtained from our model, while adhering to strong privacy guarantees, are close to the non-private posteriors.
- Score: 21.942149563564968
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalized linear models (GLMs) such as logistic regression are among the
most widely used arms in data analyst's repertoire and often used on sensitive
datasets. A large body of prior works that investigate GLMs under differential
privacy (DP) constraints provide only private point estimates of the regression
coefficients, and are not able to quantify parameter uncertainty. In this work,
with logistic and Poisson regression as running examples, we introduce a
generic noise-aware DP Bayesian inference method for a GLM at hand, given a
noisy sum of summary statistics. Quantifying uncertainty allows us to determine
which of the regression coefficients are statistically significantly different
from zero. We provide a previously unknown tight privacy analysis and
experimentally demonstrate that the posteriors obtained from our model, while
adhering to strong privacy guarantees, are close to the non-private posteriors.
Related papers
- Interval Estimation of Coefficients in Penalized Regression Models of Insurance Data [3.5637073151604093]
Tweedie exponential dispersion family is a popular choice among many to model insurance losses.
It is often important to obtain credibility (inference) of the most important features that describe the endogenous variables.
arXiv Detail & Related papers (2024-10-01T18:57:18Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Inference at the data's edge: Gaussian processes for modeling and inference under model-dependency, poor overlap, and extrapolation [0.0]
The Gaussian Process (GP) is a flexible non-linear regression approach.
It provides a principled approach to handling our uncertainty over predicted (counterfactual) values.
This is especially valuable under conditions of extrapolation or weak overlap.
arXiv Detail & Related papers (2024-07-15T05:09:50Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Differentially Private Statistical Inference through $\beta$-Divergence
One Posterior Sampling [2.8544822698499255]
We propose a posterior sampling scheme from a generalised posterior targeting the minimisation of the $beta$-divergence between the model and the data generating process.
This provides private estimation that is generally applicable without requiring changes to the underlying model.
We show that $beta$D-Bayes produces more precise inference estimation for the same privacy guarantees.
arXiv Detail & Related papers (2023-07-11T12:00:15Z) - Differentially Private Distributed Bayesian Linear Regression with MCMC [0.966840768820136]
We consider a distributed setting where multiple parties hold parts of the data and share certain summary statistics of their portions in privacy-preserving noise.
We develop a novel generative statistical model for privately shared statistics, which exploits a useful distributional relation between the summary statistics of linear regression.
We provide numerical results on both real and simulated data, which demonstrate that the proposed algorithms provide well-rounded estimation and prediction.
arXiv Detail & Related papers (2023-01-31T17:27:05Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.